python source code of OSM_data

python-urbanPlanning-master
- 24.生活圈_03_信息熵与均衡度
  - entropyBoxPlot_Statistics.py
  - entropyCalBundle.py
  - data
    - POIPtsProjection
      - 100_POI.sbx
      - 230_POI.prj
      - 20_POI.prj
      - 120_POI.prj
      - 110_POI.sbx
      - 240_POI.prj
      - 110_POI.prj
      - 200_POI.sbx
      - 170_POI.sbx
      - 100_POI.prj
      - 180_POI.prj
      - 160_POI.prj
      - 160_POI.sbx
      - 240_POI.sbx
      - 140_POI.prj
      - 230_POI.sbx
      - 130_POI.sbx
      - 190_POI.prj
      - 140_POI.sbx
      - 190_POI.sbx
      - 220_POI.sbx
      - 210_POI.sbx
      - 210_POI.prj
      - 130_POI.prj
      - 250_POI.prj
      - 20_POI.sbx
      - 150_POI.prj
      - 150_POI.sbx
      - 120_POI.sbx
      - 180_POI.sbx
      - 170_POI.prj
      - 220_POI.prj
      - 200_POI.prj
  - README.md
- 19_ROS-Kinetic深度相机3DSLAM三维点云建筑空间
  - ROS-Kinetic3DSLAM.html
  - kinect3Dcould.py
  - README.md
- draft_richie bao
  - stage_01_Chicago Spatial structure.md
  - A-driverless city project_spatial points pattern related.md
  - notice.md
  - stage_02_Chicago Spatial Structure.md
  - stage_03_Chicago Spatial Structure.md
  - README.md
  - www related_01-docsify install instruction.md
- 06_(outliers)异常值处理
  - isoutlier.py
  - README.md
  - GPSData.py
- 14_基于DBSCAN密度空间聚类林缘线生成设计
  - vegetationPlanNote.gh
  - vegetation.3dm
  - predUnique.txt
  - vegetationCluster.py
  - vegetitionCluster.txt
  - README.md
  - vegetitionPred.txt
- 09_机器学习_聚类_城市色彩_B_印象
  - comparingDiffClusteringAlgorithms.py
  - README.md
  - cityColorImpression.py
- 17_景观质量视觉评估预测
  - opencv_py.py
  - gaussionKernal.py
  - siftB.py
  - classOpp.py
  - README.md
- 07_城市色彩_A
  - pedestrianDetection.py
  - numpy-a.py
  - README.md
- 20_NLP_LDA主题建模_提取说明书关键字
  - defaultdict.py
  - NLP_gensim.py
  - testingNLP.py
  - README.md
- 02_python解释器
  - README.md
- images
- 34_Chicago_04_SVF计算以及内存管理
  - d_memoryFuncMonitor.html
  - SVF_trial.py
  - splitARasterInSeveralTiles.py
  - SVF_array_Final_bigRaster.py
  - SVF_array_Final_adj_big_blocks.py
  - README.md
- 25_生活圈_04_相关系数热力图(簇行业类)与批量图片自动排版
  - numpyTranspose.py
  - data
    - POI__partialCorrelations.npy
  - readPOICTable_partialCorrle.py
  - README.md
  - picArranging.py
- 45.Chicago_15_时空数据_02_时空分布动态
  - GeospatIal Distribution DYnamics.py
  - data
    - COVID-19_Cases__Tests__and_Deaths_by_ZIP_Code.csv
  - README.md
- 32_Chicago_02_城市空间结构分析_连接度
  - connectivity.py
  - data
  - README.md
- 43.Chicago_13_无人驾驶城市_04_数据特征描述_B
  - showMatLabFig._spatioTemporal.py
  - data
    - 02
      - NL.fig
      - LM.fig
    - 01
      - LM.fig
  - README.md
- 37_Chicago_07_建筑高度分布结构
  - pointsClustering.py
  - rasterBuildingHeightZSplit_reclassify.py
  - rasterMosaic_rasterio.py
  - generateDTMBUidingHeight.py
  - README.md
  - rasterBuildingHeightZSplit.py
  - pdal_las_lidar.py
  - rasterClustering.py
  - interpolate2D_3D.py
- 01_Python在设计领域
  - README.md
- LICENSE
- 16_Flask构建实验用网络应用平台
  - README.md
  - FlaskNote01.html
- 33_Chicago_03_雷达.las数据处理
  - rasterMosaic_rasterio.py
  - README.md
  - pdal_las_lidar.py
  - pdalBasis.py
- 12_地形生成与回归预测
  - explanatoryVariable.txt
  - targetVariable.txt
  - poiRegression.py
  - poiRegression.gh
  - paraPolylineY.txt
  - README.md
  - rf_terrainPred.py
  - paraPolylinePred.txt
  - paraPolyline.txt
  - predictedFeatures.txt
- 21_提取.jpg等图像（热力图）格式的数据为地理信息数据（raster）
  - Heatmap
    - 13.tif.ovr
    - rode_project.sbx
    - rode_project.shx
    - 13.tif.aux.xml
    - rode_project.prj
    - 13.png.aux.xml
    - 13.tfw
    - 13.pgwx
  - README.md
  - heatmapValExtraction.py
- .gitattributes
- 42.Chicago_11_无人驾驶城市_03_交互式操作-pygame-pytorch
  - data
    - phmi_label.pkl
  - README.md
  - model
  - interactivePattern.py
- 23_生活圈_02_通过计算曲线拐点找到特征层级
  - kneePts_LineGraph_readingExcel.py
  - kneePts_LineGraph.py
  - data_generator.py
  - knee_locator.py
  - data
    - POI__LineGraph.npy
    - singlePts.xlsx
    - POI__partialCorrelations.npy
    - POI__eps.npy
    - 160_poi_portion_TableToExcel.xls.baiduyun.uploading.cfg
    - poiStatistics.xlsx
  - README.md
  - boxPlot_clustering.py
- 49.Chicago_19_时空数据_05_dash 基于WEB 图表分析
  - dash_covid_19
    - app.py
    - constants.py
    - data
      - Boundaries - ZIP Codes
        geo_export_1a9a53ff-8090-4a1a-85ce-ac92bd036028.prj
        geo_export_1a9a53ff-8090-4a1a-85ce-ac92bd036028.dbf
    - assets
      - base.css
      - oil-and-gas-ternary.css
  - README.md
- 03_用列表存储的坐标_数据采集
  - dataCapturing.py
  - extractData.py
  - baiduMapPoiLandscape.csv
  - README.md
  - readCSVcoordi.py
  - conversionofCoordi.py
  - data.txt
- 41.Chicago_11_无人驾驶城市_02_pytorch深度学习模型-beta
  - phmiPatternModel.py
  - toOnehotEncoder.py
  - pytorchModelsForAVsCity.py
  - PHmi_landmarks_model.py
  - data
    - phmi_label.pkl
  - README.md
  - phmiData2rasterBunch.py
- 46.Chicago_16_时空数据_03_全局-局部空间自相关分析
  - data
    - COVID-19_Cases__Tests__and_Deaths_by_ZIP_Code.csv
  - Exploratory Spatial Data Analysis in PySAL.py
  - README.md
- 05_基于GPS调研与数据读取
  - southernInternship
    - default_20170720081441
      - bak
        default_20170720081441.kml
      - default_20170720081441.kml
  - README.md
  - GPSData.py
- 38_Chicago_08_数据图表描述
  - populationLink.py
  - SVF_array_Final_bigRaster.py
  - valueWeightStatistic_merge.py
  - data
  - README.md
  - parkDataVisulization.py
  - statistics_rasterInpolygon.py
- 36.Chicago_06_网络结构分析_networkx
  - data
  - README.md
  - vectorSpatialAnalysis.py
- 10_独立性检验(列联表)与poi空间分布结构
  - dataScraping_JsonCsv_batch.py
  - ifStatements.py
  - dbscan.py
  - xianPOI_36
  - README.md
  - poiStructure.py
- 31_Chicago_01_城市空间结构分析_边缘_物质
  - Chicago_SDAM_basis.py
  - data_generator.py
  - knee_locator.py
  - data
  - README.md
- 18_实验用网络应用平台的部署
  - README.md
  - LAevaluationA_simplify
    - local_data.db
    - config.py
    - feature_map.pkl
    - dist
      - extension
        bmap.js.map
        bmap.min.js
        bmap.js
        dataTool.min.js
        dataTool.js.map
        dataTool.js
    - LICENSE
    - imgPred_recognizer.py
    - templates
      - results.html
      - eval.html
      - testing.html
      - base.html
      - imgprediction.html
      - index.html
    - manage.py
    - exts.py
    - models.py
    - note.py
    - lAEVAL.py
    - requirementsbak.txt
    - README.md
    - migrations
      - env.py
      - versions
        __pycache__
        18d0d2499266_initial_migration.cpython-36.pyc
        18d0d2499266_initial_migration.py
      - __pycache__
        env.cpython-36.pyc
      - README
      - alembic.ini
      - script.py.mako
    - imageProcessing.py
    - imgPred_buildFeatures.py
    - requirements.txt
    - imgPred_training.py
    - conversionofCoordi.py
    - static
      - images
        imgpred
        imresize
      - css
        index.css
  - ubuntuService.html
- README.md
- 39.Chicago_09_距离权重的环境指数
  - distanceWeightStatistic.py
  - distanceWeightCalculation_raster2Polygon.py
  - valueWeightStatistic_merge.py
  - README.md
  - distanceWeightCalculation_polygon2polygon.py
- 04_建筑外环境实验数据处理
  - microClimateData.py
  - README.md
  - positions_9_34二十五日动线二.txt
- 11_森林的蔓延_SIR模型与卷积
  - 12mul12convolve_disperse.py
  - sf
    - xa_tourism_w.shx
    - xa_tourism_w.prj
    - xa_tourism_w.shp
    - xa_tourism_w.dbf
  - tif_RSImages
    - testClip6.tif.xml
    - testClip6.tif.aux.xml
    - testClip7.tfw
    - testClip6.tif
    - testClip6.tif.vat.dbf
    - testClip6.tif.ovr
  - 12mul12Pixal.bmp
  - OrganicEvolution_VegetationSystem.py
  - README.md
  - xa_gdal.py
  - 12mul12convolve.py
- 50.Chicago_20_无人驾驶城市_06_3D 参数化模型(grasshopper)
  - driverlessCityProject_2grasshopper.py
  - driverlessCityProject_import data.gh
  - driverlessCityProject_spatialPointsPattern_association_basic.py
  - data
    - 04-10-2020_312LM_LM.fig
    - location.csv
    - phmi.csv
    - landmarks.csv
  - README.md
- 15_将规划设计信息写入SQLite数据库分享与收集
  - SQliteNote01.html
  - README.md
  - laeval_cadesign_simplify
    - laeval.db
    - config.py
    - test
      - test-a.py
      - bootstrap.html
      - base.html
      - test-a.html
    - templates
      - results.html
      - eval.html
      - base.html
      - index.html
    - SQLiteNote.ipynb
    - manage.py
    - exts.py
    - laeval_cadesign.py
    - models.py
    - migrations
      - env.py
      - versions
        __pycache__
        538119d3e42a_initial_migration.cpython-36.pyc
        538119d3e42a_initial_migration.py
      - __pycache__
        env.cpython-36.pyc
      - README
      - alembic.ini
      - script.py.mako
    - imageProcessing.py
    - conversionofCoordi.py
    - static
      - images
        imgpred
        imagesA
        2017.12.15-lmk-S-WTSolutions-GPS.kmz
        2017.12.15-GF-S-WTSolutions-GPS.csv
        2017.12.15-lmk-S-WTSolutions-GPS.csv
        2017.12.15-GF-S-WTSolutions-GPS.kmz
        imresize
      - favicon.ico
      - css
        index.css
- 48.Chicago_18_无人驾驶城市_05_空间点模式
  - driverlessCityProject_spatialPointsPattern_association_corr.py
  - driverlessCityProject_spatialPointsPattern_association_basic.py
  - data
    - 05
      - 04-10-2020_NL.fig
      - 04-10-2020_LM.fig
    - 02
      - NL.fig
      - LM.fig
    - 06
      - 04-10-2020_312LM_NL.fig
      - 04-10-2020_312LM_LM.fig
  - README.md
- 13_ 基于回归预测NDVI修复地生态廊道的构建
  - ecoRecoveryGIS.py
  - rf_NDVIEvolution.py
  - README.md
- 08_基于poi数据生成kml文件与描述性统计
  - ix_fun.py
  - dataCapturing_JsonCsv.py
  - poiXianOldTown
    - poi_9_media.csv
    - poi_5_spot.json
    - poi_7_sports.json
    - poi_9_media.json
    - poi_8_education.csv
    - poi_8_education.json
  - statisticsA.py
  - csv2json.py
  - README.md
  - conversionofCoordi.py
- 35.Chicago_05_OSM数据处理与空间结构
  - kneePts_LineGraph.py
  - pointsToRasterBundle.py
  - OSM_data_cluster.py
  - shpPointsReadAndOSMAnalysis.py
  - README.md
  - shp2OsmosisPolygon.py
- 26-30_城市热环境_01-05_基于LST
  - LST.py
  - data
    - NDVI201808Cla.dbf
  - README.md
  - Fit_estimators.py
- 47.Chicago_17_时空数据_04_不平等性与隔离
  - data
    - Public_Health_Statistics-_Selected_public_health_indicators_by_Chicago_community_area.csv
    - COVID-19_Cases__Tests__and_Deaths_by_ZIP_Code.csv
  - README.md
  - equality and segregation.py
- 22_城市圈_01_连续聚类与数据分析和数据保存
  - results_data
    - POI__LineGraph.npy
    - POI__partialCorrelations.npy
    - POI__eps.npy
  - pointsToRasterBundle.py
  - collections.py
  - npRavelAndFlatten.py
  - rasterPTSextraction_statistic_poi.py
  - data
    - hamlet.txt
    - xianPOI_36
  - README.md
  - numpySave.py
  - conversionofCoordi.py
- 40.Chicago_10_无人驾驶城市_01_数据特征描述
  - showMatLabFig.py
  - data
    - LandmarkMap.fig
  - README.md
- 44.Chicago_14_时空数据_01_时间序列
  - Time Series Analysis.py
  - data
    - COVID-19_Daily_Cases_and_Deaths.csv
  - README.md

# -*- coding: utf-8 -*-
"""
Created on Wed Dec 25 17:27:01 2019

@author: Richie Bao-caDesign设计(cadesign.cn).Chicagoo
"""
import osmium as osm
import pandas as pd
import os,math,time,ogr,osr,gdal
from tqdm import tqdm
import numpy as np
from sklearn import cluster
from collections import Counter #用于一些特殊数据统计，以及实现了些方便实用的数据结构
from sklearn import preprocessing
from pylab import mpl
import matplotlib.pyplot as plt
from mpl_toolkits.axisartist.axislines import Subplot

mpl.rcParams['font.sans-serif']=['STXihei']

gdal.SetConfigOption("GDAL_FILENAME_IS_UTF8", "YES");  #解决目录中文乱码
gdal.SetConfigOption("SHAPE_ENCODING", "CP936");  #解决属性表中文乱码


#读取OSM的node数据，指定需要的字段信息。具体方法查询官网
class OSMHandler(osm.SimpleHandler):
    def __init__(self):
        osm.SimpleHandler.__init__(self)
        self.osm_data = []
        # self.count = [0, 0, 0]

    def tag_inventory(self, elem, elem_type):
        for tag in elem.tags:
            self.osm_data.append([elem_type, 
                                   elem.id, 
                                   elem.version,
                                   elem.visible,
                                   pd.Timestamp(elem.timestamp),
                                   elem.uid,
                                   elem.user,
                                   elem.changeset,
                                   len(elem.tags),                                   
                                   tag.k, 
                                   tag.v,
                                   elem.location.lon,
                                   elem.location.lat
                                    ])

    def node(self, n):
        self.tag_inventory(n, "node")

    # def way(self, w):
    #     self.tag_inventory(w, "way")

    # def relation(self, r):
    #     self.tag_inventory(r, "relation")
        
    # def node(self, n):
    #     self.count[0] += 1
    # def way(self, w):
    #     self.count[1] += 1
    # def relation(self, r):
    #     self.count[2] += 1
'''DBSCAN基于密度空间的聚类，聚类所有poi特征点'''
def affinityPropagationForPoints(dataArray,epsValue):
    print("--------------------Clustering")
    data=dataArray
    t1=time.time()     
    db=cluster.DBSCAN(eps=epsValue,min_samples=3,metric='euclidean') #meter=degree*(2 * math.pi * 6378137.0)/ 360  degree=50/(2 * math.pi * 6378137.0) * 360，在调参时，eps为邻域的距离阈值，而分析的数据为经纬度数据，为了便于调参，可依据上述公式可以在米和度之间互相转换，此时设置eps=0.0008，约为90m，如果poi的空间点之间距离在90m内则为一簇；min_samples为样本点要成为核心对象所需要的邻域样本数阈值。参数需要自行根据所分析的数据不断调试，直至达到较好聚类的结果。
    y_db=db.fit_predict(data)  #获取聚类预测类标
    t2=time.time()    
    tDiff_af=t2-t1 #用于计算聚类所需时间
    print(tDiff_af)
    
    pred=y_db  
    print(pred,len(np.unique(pred)))  #打印查看预测类标和计算聚类簇数
    
#    t3=time.time()
#    plt.close('all')
#    plt.figure(1,figsize=(20,20))
#    plt.clf()
#    cm=plt.cm.get_cmap('nipy_spectral')  #获取内置色带
#    plt.scatter(data[...,0],data[...,1],s=10,alpha=0.8,c=pred,cmap=cm) #c参数设置为预测值，传入色带，根据c值显示颜色
#    plt.show()
#    t4=time.time()
#    tDiff_plt=t4-t3  #计算图表显示时间
#    print(tDiff_plt)
    print("-------------------cluster Finishing")
    return pred,np.unique(pred)  #返回DBSCAN聚类预测值。和簇类标

'''将聚类的POI数据，写入.shp文件，用于GIS调用。'''
def point2Shp(df_osm,valueArray,fn,pt_lyrName_w,ref_lyr=False):
    ds=ogr.Open(fn,1)
#    '''参考层，用于空间坐标投影，字段属性等参照'''
#    ref_lyr=ds.GetLayer(ref_lyr)
#    ref_sr=ref_lyr.GetSpatialRef()
#    print(ref_sr)
#    ref_schema=ref_lyr.schema #查看属性表字段名和类型
#    for field in ref_schema:
#        print(field.name,field.GetTypeName())     
        
    '''建立新的datasource数据源'''
    sf_driver=ogr.GetDriverByName('ESRI Shapefile')
    sfDS=os.path.join(fn,r'sf')
#    if os.path.exists(sfDS):
#        sf_driver.DeleteDataSource(sfDS)
    pt_ds=sf_driver.CreateDataSource(sfDS)
    if pt_ds is None:
        sys.exit('Could not open{0}'.format(sfDS))
        
    '''建立新layer层'''    
    if pt_ds.GetLayer(pt_lyrName_w):
        pt_ds.DeleteLayer(pt_lyrName_w)    
       
    spatialRef = osr.SpatialReference()
    spatialRef.SetWellKnownGeogCS("WGS84") #需要注意直接定义大地坐标未"WGS84"，而未使用参考层提取的坐标投影系统
    
    pt_lyr=pt_ds.CreateLayer(pt_lyrName_w,spatialRef,ogr.wkbPoint)    
#    pt_lyr=pt_ds.CreateLayer(pt_lyrName_w,ref_sr,ogr.wkbPoint)  

    '''配置字段，名称以及类型和相关参数'''
#    pt_lyr.CreateFields(ref_schema)
    LatFd=ogr.FieldDefn("origiLat",ogr.OFTReal)
    LatFd.SetWidth(20)
    LatFd.SetPrecision(3)
    pt_lyr.CreateField(LatFd)
    LatFd.SetName("origiLong")
    pt_lyr.CreateField(LatFd)
    
#    pt_lyr.CreateFields(ref_schema)
    preFd=ogr.FieldDefn("type",ogr.OFTString)
    pt_lyr.CreateField(preFd)
    preFd.SetName("tagkey")
    pt_lyr.CreateField(preFd)
    preFd.SetName("tagvalue")
    pt_lyr.CreateField(preFd)
    
    preFd=ogr.FieldDefn("cluster",ogr.OFTInteger)
    pt_lyr.CreateField(preFd)
    # preFd.SetName("cluster")
    # pt_lyr.CreateField(preFd)    
    
#    stationName=ogr.FieldDefn("stationN",ogr.OFTString)
#    pt_lyr.CreateField(stationName)    
    
#    preFd.SetName("ObservTime")
#    pt_lyr.CreateField(preFd)  
#   
     
    '''建立feature空特征和设置geometry几何类型'''
    print(pt_lyr.GetLayerDefn())
    pt_feat=ogr.Feature(pt_lyr.GetLayerDefn())    
   
#    idx=0
    for i in tqdm(range(valueArray.shape[0])):  #循环feature         
#        print(key)
        '''设置几何体'''
        #pt_ref=feat.geometry().Clone()
        # converCoordiGCJ=cc.bd09togcj02(dataBunch.data[i][1],dataBunch.data[i][0])
        # converCoordiGPS84=cc.gcj02towgs84(converCoordiGCJ[0],converCoordiGCJ[1])
#        print(wdCoordiDicSingle[key][1],wdCoordiDicSingle[key][0])
#        print(converCoordiGPS84[0], converCoordiGPS84[1])
        wkt="POINT(%f %f)" %  (df_osm["lon"][i], df_osm["lat"][i])
#        wkt="POINT(%f %f)" %  (dataBunch.data[i][0], dataBunch.data[i][1])
        newPt=ogr.CreateGeometryFromWkt(wkt) #使用wkt的方法建立点
        pt_feat.SetGeometry(newPt)
        '''设置字段值'''
#        for i_field in range(feat.GetFieldCount()):
#            pt_feat.SetField(i_field,feat.GetField(i_field))
        pt_feat.SetField("origiLat",df_osm["lat"][i])
        pt_feat.SetField("origiLong",df_osm["lon"][i])
        
        
#        print(wdDicComplete[key]['20140901190000'])
        pt_feat.SetField("type",df_osm["type"][i]) #
        pt_feat.SetField("tagkey",df_osm["tagkey"][i])
        pt_feat.SetField("tagvalue",df_osm["tagvalue"][i])
        pt_feat.SetField("cluster",int(valueArray[i]))
#        print(idx,int(valueArray[idx]),pt_ref.GetX())
#        idx+=1
        
        '''根据设置的几何体和字段值，建立feature。循环建立多个feature特征'''
        pt_lyr.CreateFeature(pt_feat)    
    del ds       

'''绘制箱型图和小提琴图'''
def violinPlot(all_data,eps):
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(18*2, 8*2))
    # plot violin plot
    axes[0].violinplot(all_data,showmeans=False,showmedians=True)
    axes[0].set_title('Violin plot',fontsize=30)
    
    # plot box plot
    axes[1].boxplot(all_data,flierprops={'marker':'o','markerfacecolor':'red','color':'black'})
    axes[1].set_title('Box plot',fontsize=30)
   
    # adding horizontal grid lines
    for ax in axes:
        ax.yaxis.grid(True)
        ax.set_xticks([y + 1 for y in range(len(all_data))])
        ax.set_xlabel('聚类距离',fontsize=30)
        ax.set_ylabel('聚类频数(标准化)',fontsize=30)
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        ax.tick_params(labelsize=20)
        
    # add x-tick labels
    plt.setp(axes, xticks=[y + 1 for y in range(len(all_data)) if y%2==0],xticklabels=[eps[i] for i in range(len(eps)) if i%2==0])
    fig.autofmt_xdate()
#    plt.tick_params(labelsize=20)
#    plt.rcParams('font.sans-serif')=['STXihei'] #在开始已经设置mpl.rcParams['font.sans-serif']=['STXihei']，因此此处可忽略
    plt.savefig(os.path.join(savingFig,"violinPlot"))
    plt.show()

'''绘制折线图'''
def lineGraph(all_data,eps):
    fig = plt.figure(1, (18*2, 9*2))
    ax = Subplot(fig, 111)  
    fig.add_subplot(ax)

    ax.plot(eps,all_data, 'ro-',label='POI聚类总数')
    ax.axis["right"].set_visible(False)
    ax.axis["top"].set_visible(False)
    ax.set_xlabel('聚类距离',fontsize=30)
    ax.set_ylabel('聚类总数',fontsize=30)
    ax.tick_params(labelsize=20)
    
    plt.legend()
    plt.savefig(os.path.join(savingFig,"lineGraph"))
    plt.show()

'''使用numpy保存与读取数据'''
def savingData(fp,fn,data):
    np.save(os.path.join(fp,fn),data) #保存一个数组到一个二进制的文件中,保存格式是.npy
    
def savingDataZ(fp,fn,data):
    np.savez(os.path.join(fp,fn),dic=data)   #保存多个数组到同一个文件中,保存格式是.npz,可以同时保持字典

#读取numpy保存的数据    
def readingData(fp,fn):
    readedData=np.load(os.path.join(fp,fn+".npy"))
    return readedData
def readingDataz(fp,fn):
    readedData=np.load(os.path.join(fp,fn+".npz"))
    return readedData



'''批量计算'''
def loopCalculate(df_osm,epsDegree,fn,eps):
    xyzArray=pd.DataFrame({"lon": df_osm['lon'] , "lat": df_osm['lat'] }).to_numpy()
    robustScaleList=[]
    totalNumber=[]
    CTableDic={}
    partialCorrelationsList=[]
    counter=0
    
    #逐一计算所有距离的聚类
    for i in range(len(epsDegree)):        
        pred,predLable=affinityPropagationForPoints(xyzArray,epsDegree[i]) #聚类计算，返回预测值及簇类标

        pt_lyrName_w=r'%s_POI'%eps[i] #字符串格式化输出文件名
        point2Shp(df_osm,pred,fn,pt_lyrName_w) 
        print("\n%s has been written to disk"%i)
        
        counterData=Counter(pred)   #聚类簇类标频数统计
#        print(counterData)
        counterValue=np.array(list(counterData.values()))
        cvFloat=counterValue.astype(float)
        robustScale=preprocessing.robust_scale(cvFloat.reshape(-1,1))  #如果数据中含有异常值，那么使用均值和方差缩放数据的效果并不好，因此用preprocessing.robust_scale()缩放带有outlier的数据 
        cvF=robustScale.ravel() #展平，注意numpy的ravel() 和 flatten()函数的区别
        robustScaleList.append(cvF)        
        totalNumber.append(len(predLable)) #预测类标的数量       
        
    return robustScaleList,totalNumber
        

if __name__=="__main__": 
    # osmChicagoFn=r"D:\data\data_01_Chicago\osm\map_exercise.osm"
    osmChicagoFn=r"D:\data\data_01_Chicago\osm\ChicagoOSM.osm"
    osmhandler = OSMHandler()
    # scan the input file and fills the handler list accordingly
    osmhandler.apply_file(osmChicagoFn)
    
    # transform the list into a pandas DataFrame
    # # data_colnames = ['type', 'id', 'version', 'visible', 'ts', 'uid','user', 'chgset', 'ntags', 'tagkey', 'tagvalue']
    # data_colnames = ['type', 'id', 'version', 'visible', 'ts', 'uid','user', 'chgset', 'ntags', 'tagkey', 'tagvalue','lon','lat']
    data_colnames = ['type', 'id', 'version', 'visible', 'ts', 'uid','user', 'chgset', 'ntags', 'tagkey', 'tagvalue','lon','lat']
    df_osm = pd.DataFrame(osmhandler.osm_data, columns=data_colnames) #指定字段，读取OSM数据，并存储为dataframe数据格式
    
    # eps=list(range(20,520,10)) #设置多个聚类距离，因为已经将经纬度转换为了米制距离单位，因此不用如下行代码处理
    eps=list(range(20,520,10)) #设置多个聚类距离，因为已经将经纬度转换为了米制距离单位，因此不用如下行代码处理

    epsDegree=np.array(eps)/(2 * math.pi * 6378137.0) * 360
    fn=r"D:\data\data_01_Chicago\QGisDat\OSMPointsCluster"    
    robustScaleList,totalNumber=loopCalculate(df_osm,epsDegree,fn,eps)
    #绘制图表，观察数据变化
    savingFig=r"D:\data\data_01_Chicago\results_figure"
    violinPlot(robustScaleList,eps) #绘制箱型图/小提琴图
    lineGraph(totalNumber,eps) #绘制折线图/曲线图    
    
    
    #saving data。该部分仅保存了用于图表分析的部分数据
    savingFp=r'D:\data\data_01_Chicago\results_data_save' #将数据保存到硬盘中，便于日后使用，减少重复计算时间
    
    savingFn=r'POI_violin'
    tempData=robustScaleList
    savingData(savingFp,savingFn,tempData)    
    # X=readingData(savingFp,savingFn)
   
    savingData(savingFp,"POI__LineGraph",totalNumber)  
    Y=readingData(savingFp,"POI__LineGraph")
    savingData(savingFp,"POI__eps",eps)  
    # Z=readingData(savingFp,"POI__eps")