Irohabook
0
686

Pythonのpandasで特定のカラムのみを選択する

pandas の DataFrame から特定の列を選択してみよう。

import pandas as pd

df = pd.read_csv('population.csv', index_col=0)
rows = df[['男', '女']]
print(rows)

結果はこうなる。

             男        女
市区町村                   
千代田区    31,935   31,700
中央区     77,241   85,261
港 区    121,326  136,100
新宿区    173,743  172,419
文京区    105,462  116,027
台東区    101,917   97,375
墨田区    134,678  137,181
江東区    256,116  262,363
品川区    193,644  201,056
目黒区    132,206  147,136
大田区    362,653  366,881
世田谷区   431,026  477,881
渋谷区    108,768  117,826
中野区    167,378  164,280
杉並区    273,057  296,075
豊島区    145,334  144,174
北 区    174,910  177,066
荒川区    107,283  108,683
板橋区    278,662  288,228
練馬区    356,279  376,154
足立区    345,291  343,221
葛飾区    231,272  231,319
江戸川区   351,914  346,117
八王子市   281,506  280,954
立川市     91,460   92,362
武蔵野市    70,120   76,279
三鷹市     91,624   95,575
青梅市     67,393   66,693
府中市    130,582  129,429
昭島市     56,384   56,831
...        ...      ...
小金井市    59,955   61,488
小平市     95,312   98,284
日野市     92,983   92,410
東村山市    73,621   77,168
国分寺市    60,901   62,788
国立市     37,161   38,877
福生市     29,132   29,111
狛江市     40,005   42,476
東大和市    42,208   43,357
清瀬市     36,092   38,645
東久留米市   57,066   59,830
武蔵村山市   36,177   36,369
多摩市     72,927   75,818
稲城市     45,589   44,996
羽村市     28,251   27,356
あきる野市   40,304   40,547
西東京市    98,839  103,978
瑞穂町     16,922   16,291
日の出町     8,224    8,508
檜原村      1,100    1,117
奥多摩町     2,601    2,578
大島町      3,971    3,745
利島村        175      148
新島村      1,325    1,397
神津島村       975      923
三宅村      1,356    1,125
御蔵島村       167      150
八丈町      3,720    3,745
青ヶ島村        92       67
小笠原村     1,451    1,174

[62 rows x 2 columns]

ここで population.csv には下のデータが入っている。

市区町村,世帯数,総数,男,女,人口密度
千代田区,"35,830","63,635","31,935","31,700","5,458"
中央区,"91,852","162,502","77,241","85,261","15,916"
港 区,"145,865","257,426","121,326","136,100","12,638"
新宿区,"219,639","346,162","173,743","172,419","18,999"
文京区,"121,128","221,489","105,462","116,027","19,618"
台東区,"118,858","199,292","101,917","97,375","19,712"
墨田区,"150,855","271,859","134,678","137,181","19,743"
江東区,"267,262","518,479","256,116","262,363","12,910"
品川区,"220,678","394,700","193,644","201,056","17,281"
目黒区,"156,583","279,342","132,206","147,136","19,042"
大田区,"391,146","729,534","362,653","366,881","11,993"
世田谷区,"479,792","908,907","431,026","477,881","15,657"
渋谷区,"137,582","226,594","108,768","117,826","14,996"
中野区,"204,613","331,658","167,378","164,280","21,274"
杉並区,"321,531","569,132","273,057","296,075","16,710"
豊島区 ,"179,880","289,508","145,334","144,174","22,253"
北 区,"196,580","351,976","174,910","177,066","17,078"
荒川区,"115,944","215,966","107,283","108,683","21,256"
板橋区,"309,133","566,890","278,662","288,228","17,594"
練馬区,"370,567","732,433","356,279","376,154","15,234"
足立区,"346,739","688,512","345,291","343,221","12,930"
葛飾区,"233,158","462,591","231,272","231,319","13,293"
江戸川区,"342,016","698,031","351,914","346,117","13,989"
八王子市,"267,736","562,460","281,506","280,954","3,018"
立川市,"91,270","183,822","91,460","92,362","7,546"
武蔵野市,"76,765","146,399","70,120","76,279","13,333"
三鷹市,"93,665","187,199","91,624","95,575","11,401"
青梅市,"63,142","134,086","67,393","66,693","1,298"
府中市,"125,060","260,011","130,582","129,429","8,835"
昭島市,"53,827","113,215","56,384","56,831","6,529"
調布市,"118,804","235,169","114,909","120,260","10,898"
町田市,"195,643","428,685","209,971","218,714","5,991"
小金井市,"60,367","121,443","59,955","61,488","10,747"
小平市,"91,602","193,596","95,312","98,284","9,439"
日野市,"88,402","185,393","92,983","92,410","6,729"
東村山市,"72,676","150,789","73,621","77,168","8,797"
国分寺市,"60,111","123,689","60,901","62,788","10,793"
国立市,"37,728","76,038","37,161","38,877","9,330"
福生市,"30,506","58,243","29,132","29,111","5,733"
狛江市,"42,157","82,481","40,005","42,476","12,908"
東大和市,"38,852","85,565","42,208","43,357","6,376"
清瀬市,"35,454","74,737","36,092","38,645","7,306"
東久留米市,"54,257","116,896","57,066","59,830","9,076"
武蔵村山市,"31,640","72,546","36,177","36,369","4,735"
多摩市,"71,851","148,745","72,927","75,818","7,080"
稲城市,"39,991","90,585","45,589","44,996","5,041"
羽村市,"25,718","55,607","28,251","27,356","5,617"
あきる野市,"35,519","80,851","40,304","40,547","1,100"
西東京市,"97,350","202,817","98,839","103,978","12,877"
瑞穂町,"14,912","33,213","16,922","16,291","1,971"
日の出町,"7,383","16,732","8,224","8,508",596
檜原村,"1,181","2,217","1,100","1,117",21
奥多摩町,"2,685","5,179","2,601","2,578",23
大島町,"4,635","7,716","3,971","3,745",85
利島村,174,323,175,148,78
新島村,"1,381","2,722","1,325","1,397",99
神津島村,917,"1,898",975,923,102
三宅村,"1,620","2,481","1,356","1,125",45
御蔵島村,170,317,167,150,15
八丈町,"4,365","7,465","3,720","3,745",103
青ヶ島村,109,159,92,67,27
小笠原村,"1,492","2,625","1,451","1,174",25

引用:住民基本台帳による東京都の世帯と人口(町丁別・年齢別)

pandas の read_csv はファイルの内容を DataFrame にする。読み込んだ DataFrame にカラム名を入れると、そのカラムに属する値だけが返る。

1 つの列のみを選択する

pandas を使って population.csv から各自治体の人口総数のみを選択してみよう。

import pandas as pd

df = pd.read_csv('population.csv', index_col=0)
rows = df['総数']
print(rows)
print(type(rows))

今回は 1 つの列「総数」のみを選択している。結果はこうなる。

市区町村
千代田区      63,635
中央区      162,502
港 区      257,426
新宿区      346,162
文京区      221,489
台東区      199,292
墨田区      271,859
江東区      518,479
品川区      394,700
目黒区      279,342
大田区      729,534
世田谷区     908,907
渋谷区      226,594
中野区      331,658
杉並区      569,132
豊島区      289,508
北 区      351,976
荒川区      215,966
板橋区      566,890
練馬区      732,433
足立区      688,512
葛飾区      462,591
江戸川区     698,031
八王子市     562,460
立川市      183,822
武蔵野市     146,399
三鷹市      187,199
青梅市      134,086
府中市      260,011
昭島市      113,215
          ...   
小金井市     121,443
小平市      193,596
日野市      185,393
東村山市     150,789
国分寺市     123,689
国立市       76,038
福生市       58,243
狛江市       82,481
東大和市      85,565
清瀬市       74,737
東久留米市    116,896
武蔵村山市     72,546
多摩市      148,745
稲城市       90,585
羽村市       55,607
あきる野市     80,851
西東京市     202,817
瑞穂町       33,213
日の出町      16,732
檜原村        2,217
奥多摩町       5,179
大島町        7,716
利島村          323
新島村        2,722
神津島村       1,898
三宅村        2,481
御蔵島村         317
八丈町        7,465
青ヶ島村         159
小笠原村       2,625
Name: 総数, Length: 62, dtype: object
<class 'pandas.core.series.Series'>

市区町村名が出力されているが rows は一次元のデータである。このデータは pandas の Series という型で、純粋な Python の list に相当する。もちろん Series と list は違う。

pandas の Series は最大値、最小値といった標準的な関数を最初から用意している。次の記事では人口の最大値、最小値、分散などを求める。

次の記事

pandas