全球 0.1°、半小时/逐日分辨率的GPM卫星降水数据

2021-07-02 10:34:38 浏览数 (1)


7月中上旬,这些数据(~4TB)应该会全部上传完毕,存放到百度云云盘,免费供大家下载用于科研、学习等用途,希望能为大家科研工作、学习提供切实的帮助。但话又说回来,不远的未来(5-10年)数据会爆发性增长,如何获取自己需要的数据,合理、科学地使用这些数据还是巨大的挑战。另外,也期待国产气象卫星能够做好质控、精简数据下载流程、提升用户体验,让气象科研工作者和气象学子们自信地使用国产卫星产品,研究成果能够反过来为卫星产品的升级迭代提供参考!只有这样才算切切实实响应总书记的号召:“广大科技工作者要把论文写在祖国的大地上,把科技成果应用在实现现代化的伟大事业中。”


近期,不少人私信需要GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V06 (GPM_3IMERGHH)和逐日数据,就下载来看,特殊时期网络环境要求比较高,一般下载会一直校验或者直接连不上,而使用断点续传会存在下载到空文件的情况。而过去都不存在这些问题的,我也下载过一小段时间的数据。最后这几天还是摸索了一些办法,尝试解决了这些问题:

1.使用V**来改善网络环境,全部近4TB数据都走流量,优点是简单快捷、稳定,注册账号和密码后,就可以使用Linux 下的wget命令来下载了

2.使用官网推荐的wget下载,但经测试貌似只有低版本才能够规避文件字符过长反复校验的问题。经测试最稳定的是RedHat下的GNU wget 1.12版本。优点免费;缺点,需要找到合适版本以及相对较好的网络环境,一旦中断下载就可能前功尽弃,使用wget 断点传输也会存在重复下载和下载空文件的情况。

综合以上两种情况最终采取的方案:把下载文件的链接按照年切分,在WSL2下调用SSR代理流量多终端同时开启任务下载。速度较快、稳定、一旦中断下载可以手动检查继续恢复下载。数据的文件最后需要校验,而接近40W个文件,最后还需要写了个Python脚本用于文件校验。

代码语言:javascript复制
import os,sys

nlist = []
def findAllFile(base):
    for root, ds, fs in os.walk(base):
        for f0 in fs:
            if f0.endswith('.HDF5'):
                nlist.append(f0)


findAllFile('/mnt/d/')
print(nlist[0:5])

print('========================================')

allnlist = list(open('/mnt/z/Research/05.DATA_All/subset_GPM_3IMERGHH_06_20210624_181150_new.txt').read().splitlines())

print(allnlist[0:5])

print('=======================================')

diff = list(set(allnlist) - set(nlist))
print(diff)
代码语言:javascript复制

结果:

针对缺失的文件再补充下载即可!全部数据下载耗时大概两天的时间!仅供参考!

下面是按年拆分下载后的log日志,有V**加持速度尚可!可能对CPU和内存要求比较高(Intel(R) Core(TM) i7-10700KF CPU @ 3.80GHz ,32GB ),性能一般的机器可以多分几个批次下载。

代码语言:javascript复制
2000 Downloaded: 10267 files, 101G in 1d 2h 26m 26s (1.09 MB/s)
2001 Downloaded: 17520 files, 175G in 1d 19h 41m 37s (1.14 MB/s)
2002 Downloaded: 17520 files, 174G in 1d 19h 36m 6s (1.14 MB/s)
2003 Downloaded: 17520 files, 174G in 1d 19h 39m 37s (1.13 MB/s)
2004 Downloaded: 17568 files, 174G in 1d 19h 46m 6s (1.13 MB/s)
2005 Downloaded: 17520 files, 171G in 1d 19h 24m 45s (1.12 MB/s)
2006 Downloaded: 17520 files, 172G in 1d 19h 29m 45s (1.12 MB/s)
2007 Downloaded: 17520 files, 171G in 1d 19h 30m 6s (1.12 MB/s)
2008 Downloaded: 17568 files, 173G in 1d 19h 49m 24s (1.13 MB/s)
2009 Downloaded: 17520 files, 176G in 1d 19h 55m 14s (1.14 MB/s)
2010 Downloaded: 17520 files, 179G in 1d 20h 10m 14s (1.15 MB/s)
2011 Downloaded: 17520 files, 175G in 1d 19h 49m 22s (1.13 MB/s)
2012 Downloaded: 17568 files, 176G in 1d 20h 1m 37s (1.14 MB/s)
2013 Downloaded: 17520 files, 170G in 1d 19h 25m 31s (1.11 MB/s)
2014 Downloaded: 17520 files, 172G in 1d 19h 38m 32s (1.12 MB/s)
2015 Downloaded: 17520 files, 170G in 1d 19h 25m 29s (1.12 MB/s)
2016 Downloaded: 17568 files, 176G in 1d 19h 56m 31s (1.14 MB/s)
2017 Downloaded: 17520 files, 176G in 1d 19h 49m 10s (1.14 MB/s)
2018 Downloaded: 17520 files, 175G in 1d 19h 53m 37s (1.14 MB/s)
2019 Downloaded: 17520 files, 179G in 1d 20h 10m 15s (1.15 MB/s)
2020 Downloaded: 17568 files, 179G in 1d 20h 13m 2s (1.15 MB/s)
2021 Downloaded: 2832 files, 29G in 7h 36m 17s (1.07 MB/s)

以上只针对网络环境不稳定的情况,若是没遇到类似的问题,请略过!另外,GPM卫星降水数据的质量可以参考国内外相关文献!

半小时数据简要说明

To cite the data in publications:

Huffman, G.J., E.F. Stocker, D.T. Bolvin, E.J. Nelkin, Jackson Tan (2019), GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V06, Greenbelt, MD, Goddard Earth Sciences Data and Information Services Center (GES DISC), Accessed: [Data Access Date], 10.5067/GPM/IMERG/3B-HH/06

逐日数据简要说明

To cite the data in publications:

Huffman, G.J., E.F. Stocker, D.T. Bolvin, E.J. Nelkin, Jackson Tan (2019), GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree V06, Edited by Andrey Savtchenko, Greenbelt, MD, Goddard Earth Sciences Data and Information Services Center (GES DISC), Accessed: [Data Access Date], 10.5067/GPM/IMERGDF/DAY/06

官方多方案下载指导链接:(点击图片放大)

https://disc.gsfc.nasa.gov/data-access#windows_wget

更多关于GPM卫星数据的请参考:

https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGHH_06/summary

https://docserver.gesdisc.eosdis.nasa.gov/public/project/GPM/IMERG_ATBD_V06.pdf

https://gpm.nasa.gov/missions/GPM


以下介绍、编译来自:Weekly Weather Review

为什么要用GPM卫星资料?

2014年2月27日,NASA的全球降水测量实验利用卫星观测可以提供比TRMM卫星数据空间分辨率更高,并且覆盖全球的降水观测资料。例如,GPM卫星上加上了一个新的ka波段的降水雷达以及一个高频微波仪,可以提高对小雨以及降雪的观测。除此以外,GPM的综合多卫星反演可以显著提高其时空分辨率、空间覆盖等。

GPM数据产品

GPM的数据产品主要分为三: Level-1, Level-2和Level-3。Level-1就是卫星上各观测仪器的基数据:

  • Level-1还分为Level-1A,Level-1B和Level-1C。Level-1A是指探测仪器直接接收的信号,例如雷达接收到的波能量。Level-1B是将Level-1A收集的信号转化成各类探测变量的单位,例如雷达反射率。Level-1C是针对微波仪的亮温观测,去除掉各个传感器以及多卫星降水反演算法之间的系统误差。Level-1中的基数据包含了各个观测仪器的原始分辨率,可以用来进行特定的算法计算以及其他用途。也就是说,只要有方法,想干什么干什么,但是很难。
  • Level-2中的数据是以Level-1的基数据为基础,在相同的分辨率和位置上导出的地球物理变量,例如ku波长的雷达反演的降水以及雷达与微波成像仪合成的降水等。Level-2的降水资料可以用来进行个例分析,地面降水验证以及模式验证等。大家可以赶紧开始对比验证啊!仿佛看到文章雨正在倾盆而下。
  • Level-3中的数据则是又在Level-2的基础上插值到固定时间、空间分辨率的格点上,比较完整并且具有较高的一致性。主要提供半小时降水、日降水以及月平均降水的格点资料。半小时降水利用多卫星降水反演获得,日降水是综合了微波成像仪以及多频率降水雷达,月平均降水则包含了以上所有。使用最广泛的数据是多卫星、多传感器以及多算法的GPM综合反演产品(IMERG),包括Early,Late和Final Run产品。这三者的区别在于发布时间。Early Run表示获得观测数据后6h发布,Late和Final Run分别在18小时以及4个月后发布。Early和Late Run的IMERG产品主要是几乎实时的观测,并且用气候站进行检验校准。Final Run则是利用全球降水气候中心的月平均站点资料进行误差订正。他们的空间分辨率是0.1°,时间分辨率是半小时,都比TRMM降水观测有大大的提高。高时空分辨率的降水观测资料对于降水时间以及业务预报都具有相当重大的意义。月平均资料则可以用来加深对全球降水观测不确定性的理解。

下图是GPM数据分级描述

原文来自在BAMS上发表的一篇论文“GlobalPrecipitation Measurement Mission Products and Services at the NASA GES DISC”,文字编辑翻译来自:Weekly Weather Review

The Integrated Multi-satellitE Retrievals for GPM (IMERG) is the unified U.S. algorithm that provides the multi-satellite precipitation product for the U.S. GPM team. Minor Version 06B is the current version of the data set. Older versions will no longer be available and have been superseded by Version 06B.

The precipitation estimates from the various precipitation-relevant satellite passive microwave (PMW) sensors comprising the GPM constellation are computed using the 2017 version of the Goddard Profiling Algorithm (GPROF2017), then gridded, intercalibrated to the GPM Combined Ku Radar-Radiometer Algorithm (CORRA) product, and merged into half-hourly 0.1°x0.1° (roughly 10x10 km) fields. Note that CORRA is adjusted to the monthly Global Precipitation Climatology Project (GPCP) Satellite-Gauge (SG) product over high-latitude ocean and tropical land to correct known biases.

The half-hourly intercalibrated merged PMW estimates are then input to both the Climate Prediction Center (CPC) Morphing-Kalman Filter (CMORPH-KF) Lagrangian time interpolation scheme and the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks Cloud Classification System (PERSIANN-CCS) re-calibration scheme. In parallel, CPC assembles the zenith-angle-corrected, intercalibrated merged geo-IR fields and forwards them to PPS for input to the PERSIANN-CCS algorithm (supported by an asynchronous re-calibration cycle) which are then input to the CMORPH-KF morphing (quasi-Lagrangian time interpolation) scheme.

The CMORPH-KF morphing (supported by an asynchronous KF weights updating cycle) uses the PMW and IR estimates to create half-hourly estimates. The motion vectors for the morphing are computed by maximizing the pattern correlation of successive hours of the vertically integrated vapor (TQV) provided by the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2) and Goddard Earth Observing System model Version 5 (GEOS-5) Forward Processing (FP) for the post-real-time (Final) Run and the near-real-time (Early and Late) Runs, respectively. The KF uses the morphed data as the “forecast” and the IR estimates as the “observations”, with weighting that depends on the time interval(s) away from the microwave overpass time. The IR becomes important after about ±90 minutes away from the overpass time.

The IMERG system is run twice in near-real time: "Early" multi-satellite product ~4 hr after observation time using only forward morphing and "Late" multi-satellite product ~14 hr after observation time, using both forward and backward morphing and once after the monthly gauge analysis is received: "Final", satellite-gauge product ~3.5 months after the observation month, using both forward and backward morphing and including monthly gauge analyses.

Currently, the near-real-time Early and Late half-hourly estimates have no concluding calibration, while in the post-real-time Final Run the multi-satellite half-hourly estimates are adjusted so that they sum to the Final Run monthly satellite-gauge combination. In all cases the output contains multiple fields that provide information on the input data, selected intermediate fields, and estimation quality. In general, the complete calibrated precipitation, precipitationCal, is the data field of choice for most users.

Briefly describing the Final Run, the input precipitation estimates computed from the various satellite passive microwave sensors are intercalibrated to the CORRA product (because it is presumed to be the best snapshot TRMM/GPM estimate after adjustment to the monthly GPCP SG), then "forward/backward morphed" and combined with microwave precipitation-calibrated geo-IR fields, and adjusted with seasonal GPCP SG surface precipitation data to provide half-hourly and monthly precipitation estimates on a 0.1°x0.1° (roughly 10x10 km) grid over the globe. Precipitation phase is computed using analyses of surface temperature, humidity, and pressure. The current period of record is June 2000 to the present (delayed by about 3.5 months).

0 人点赞