本文是对
reshape
命令的介绍,力图通过 10 个示例全面介绍该命令。本文所用数据主要来自Stata
的数据管理手册([D] Stata Data Management Reference Manual)和 Mitchell 的教材(Data management using Stata: a practical handbook),前者为网络数据可直接载入,后者为作者提供的数据集,可从文末“补充材料”
处获取。
在数据分析中, 我们所获取的数据集通常具有不同的形态(form),具体有长形(long form)和宽形(wide form)两种(见下方图1和图2)。出于数据分析的需要(尤其是面板数据分析和多层次回归分析),不同形态的数据集各有其用,能够得其一而转换为另一形态便具有很强的实用性。
使用 Stata 进行数据处理十分便捷,其中的reshape
命令就是用于实现数据形态之转换的,即 Convert data from wide to long form and vice versa。不论哪种数据形态,数据集中均包含两种形态下的全部信息,区别在于数据呈现的方式。
本文是对 reshape
命令的介绍,力图通过10个示例全面介绍该命令的使用方式以及需要注意的问题。
基本语法
代码语言:txt复制/*
命令语法结构:
reshape long stubnames, i(varlist) [options]
reshape wide stubnames, i(varlist) [options]
说明:
i(varlist) - use varlist as the ID variables
i(varlist) is required
j(varname [values])
long → wide: varname, existing variable
wide → long: varname, new variable
takes a variable name or a variable name and a list of values
*/
示例 1:快速上手
代码语言:javascript复制clear
set more off
capture log close
global root = "..." //设定一个存放数据的根目录
use "https://www.stata-press.com/data/r17/reshape1", clear
des
list
reshape long inc ue, i(id) j(year) //该数据集是一个 wide 形,将其转换成 long 形数据
list, sep(3)
reshape wide inc ue, i(id) j(year) //同理,把转换后的数据集转回 wide 形
list
generate id = _n //没有ID变量时可以创建一个
示例 2:i 不唯一时由 wide 转 long (报错)
代码语言:javascript复制use "https://www.stata-press.com/data/r17/reshape2", clear
list
/*
----------------------------------
| id sex inc80 inc81 inc82 |
|----------------------------------|
1. | 1 0 5000 5500 6000 |
2. | 2 1 2000 2200 3300 |
3. | 3 0 3000 2000 1000 |
4. | 2 0 2400 2500 2400 |
----------------------------------
*/
reshape long inc, i(id) j(year) //执行转换后结果如下,会用红色字体报错
/*
(j = 80 81 82)
variable id does not uniquely identify the observations
Your data are currently wide. You are performing a reshape long. You specified i(id) and j(year). In the current wide
form, variable id should uniquely identify the observations. Remember this picture:
long wide
--------------- ------------------
| i j a b | | i a1 a2 b1 b2 |
|---------------| <--- reshape ---> |------------------|
| 1 1 1 2 | | 1 1 3 2 4 |
| 1 2 3 4 | | 2 5 7 6 8 |
| 2 1 5 6 | ------------------
| 2 2 7 8 |
---------------
Type reshape error for a list of the problem observations.
*/
reshape error //定位错误的原因为:2 of 4 observations have duplicate i values
/*
----
| id |
|----|
2. | 2 |
3. | 2 |
----
*/
示例 3:j 不唯一时由 long 转 wide (报错)
代码语言:javascript复制use "https://www.stata-press.com/data/r17/reshapexp1", clear
list
/*
------------------------
| id year sex inc |
|------------------------|
1. | 1 80 0 5000 |
2. | 1 81 0 5500 |
3. | 1 81 0 5400 |
4. | 1 82 0 6000 |
------------------------
*/
reshape wide inc, i(id) j(year) //执行转换后结果如下,会用红色字体报错
/*
values of variable year not unique within id
Your data are currently long. You are performing a reshape wide. You specified i(id) and j(year). There are observations
within i(id) with the same value of j(year). In the long data, variables i() and j() together must uniquely identify the
observations.
long wide
--------------- ------------------
| i j a b | | i a1 a2 b1 b2 |
|---------------| <--- reshape ---> |------------------|
| 1 1 1 2 | | 1 1 3 2 4 |
| 1 2 3 4 | | 2 5 7 6 8 |
| 2 1 5 6 | ------------------
| 2 2 7 8 |
---------------
Type reshape error for a list of the problem variables.
*/
reshape error //2 of 4 observations have repeated year values
/*
-----------
| id year |
|-----------|
2. | 1 81 |
3. | 1 81 |
-----------
*/
示例 4:long 转 wide 时未设定变量(报错)
代码语言:javascript复制use "https://www.stata-press.com/data/r17/reshape6", clear
list, sepby(id)
/*
-----------------------------
| id year sex inc ue |
|-----------------------------|
1. | 1 80 0 5000 0 |
2. | 1 81 0 5500 1 |
3. | 1 82 0 6000 0 |
|-----------------------------|
4. | 2 80 1 2000 1 |
5. | 2 81 1 2200 0 |
6. | 2 82 1 3300 0 |
|-----------------------------|
7. | 3 80 0 3000 0 |
8. | 3 81 0 2000 0 |
9. | 3 82 0 1000 1 |
-----------------------------
*/
reshape wide inc, i(id) j(year) //ue 变量未设定
/*
variable ue not constant within id
Your data are currently long. You are performing a reshape wide. You typed something like
. reshape wide a b, i(id) j(year)
There are variables other than a, b, id, year in your data. They must be constant within id because that is the only way
they can fit into wide data without loss of information.
The variable or variables listed above are not constant within id. Perhaps the values are in error. Type reshape error
for a list of the problem observations.
Either that, or the values vary because they should vary, in which case you must either add the variables to the list of
xij variables to be reshaped, or drop them.
*/
reshape wide inc ue, i(id) j(year) //正确的设定
list
/*
-------------------------------------------------------
| id inc80 ue80 inc81 ue81 inc82 ue82 sex |
|-------------------------------------------------------|
1. | 1 5000 0 5500 1 6000 0 0 |
2. | 2 2000 1 2200 0 3300 0 1 |
3. | 3 3000 0 2000 0 1000 1 0 |
-------------------------------------------------------
*/
示例 5:wide 转 long 时变量未设定
代码语言:javascript复制use "https://www.stata-press.com/data/r17/reshape1", clear
list
/*
-------------------------------------------------------
| id sex inc80 inc81 inc82 ue80 ue81 ue82 |
|-------------------------------------------------------|
1. | 1 0 5000 5500 6000 0 1 0 |
2. | 2 1 2000 2200 3300 1 0 0 |
3. | 3 0 3000 2000 1000 0 0 1 |
-------------------------------------------------------
*/
reshape long inc, i(id) j(year) //注意!结果是错误的,但是未报错
list, sepby(id)
/*
---------------------------------------------
| id year sex inc ue80 ue81 ue82 |
|---------------------------------------------|
1. | 1 80 0 5000 0 1 0 |
2. | 1 81 0 5500 0 1 0 |
3. | 1 82 0 6000 0 1 0 |
|---------------------------------------------|
4. | 2 80 1 2000 1 0 0 |
5. | 2 81 1 2200 1 0 0 |
6. | 2 82 1 3300 1 0 0 |
|---------------------------------------------|
7. | 3 80 0 3000 0 0 1 |
8. | 3 81 0 2000 0 0 1 |
9. | 3 82 0 1000 0 0 1 |
---------------------------------------------
*/
reshape wide //to undo the unwanted change and then try again
reshape long inc ue, i(id) j(year)
list, sepby(id)
/*
-----------------------------
| id year inc sex ue |
|-----------------------------|
1. | 1 80 5000 0 0 |
2. | 1 81 5500 0 1 |
3. | 1 82 6000 0 0 |
|-----------------------------|
4. | 2 80 2000 1 1 |
5. | 2 81 2200 1 0 |
6. | 2 82 3300 1 0 |
|-----------------------------|
7. | 3 80 3000 0 0 |
8. | 3 81 2000 0 0 |
9. | 3 82 1000 0 1 |
-----------------------------
*/
示例 6:wide 转 long 时变量缺失下的转换
代码语言:javascript复制use "https://www.stata-press.com/data/r17/reshape1", clear
drop ue81
list
/*
------------------------------------------------
| id sex inc80 inc81 inc82 ue80 ue82 |
|------------------------------------------------|
1. | 1 0 5000 5500 6000 0 0 |
2. | 2 1 2000 2200 3300 1 0 |
3. | 3 0 3000 2000 1000 0 1 |
------------------------------------------------
*/
reshape long inc ue, i(id) j(year) //reshape placed missing values where ue81 values were unavailable
list, sepby(id)
/*
-----------------------------
| id year sex inc ue |
|-----------------------------|
1. | 1 80 0 5000 0 |
2. | 1 81 0 5500 . |
3. | 1 82 0 6000 0 |
|-----------------------------|
4. | 2 80 1 2000 1 |
5. | 2 81 1 2200 . |
6. | 2 82 1 3300 0 |
|-----------------------------|
7. | 3 80 0 3000 0 |
8. | 3 81 0 2000 . |
9. | 3 82 0 1000 1 |
-----------------------------
*/
reshape wide inc ue, i(id) j(year) //the ue81 variable would be created and would contain all missing values
list
/*
-------------------------------------------------------
| id inc80 ue80 inc81 ue81 inc82 ue82 sex |
|-------------------------------------------------------|
1. | 1 5000 0 5500 . 6000 0 0 |
2. | 2 2000 1 2200 . 3300 0 1 |
3. | 3 3000 0 2000 . 1000 1 0 |
-------------------------------------------------------
*/
示例 7:wide 转 long 时变量的命名
代码语言:javascript复制use "https://www.stata-press.com/data/r17/reshape3", clear
list
/*
----------------------------------------------------------
| id sex inc80r inc81r inc82r ue80 ue81 ue82 |
|----------------------------------------------------------|
1. | 1 0 5000 5500 6000 0 1 0 |
2. | 2 1 2000 2200 3300 1 0 0 |
3. | 3 0 3000 2000 1000 0 0 1 |
----------------------------------------------------------
*/
reshape long inc@r ue, i(id) j(year) //@ characters to indicate where the numbers go
list, sepby(id)
/*
-----------------------------
| id year sex incr ue |
|-----------------------------|
1. | 1 80 0 5000 0 |
2. | 1 81 0 5500 1 |
3. | 1 82 0 6000 0 |
|-----------------------------|
4. | 2 80 1 2000 1 |
5. | 2 81 1 2200 0 |
6. | 2 82 1 3300 0 |
|-----------------------------|
7. | 3 80 0 3000 0 |
8. | 3 81 0 2000 0 |
9. | 3 82 0 1000 1 |
-----------------------------
*/
use "https://www.stata-press.com/data/r17/reshape3", clear
reshape long inc@r ue@, i(id) j(year) //与上面的转换命令等价,充分掌握 @ 字符的作用
list //结果与上面一致
reshape wide inc@r ue, i(id) j(year) //may similarly be used for converting data from long to wide format
list //注意转换后与原始数据集中变量的顺序是不一致的
/*
----------------------------------------------------------
| id inc80r ue80 inc81r ue81 inc82r ue82 sex |
|----------------------------------------------------------|
1. | 1 5000 0 5500 1 6000 0 0 |
2. | 2 2000 1 2200 0 3300 0 1 |
3. | 3 3000 0 2000 0 1000 1 0 |
----------------------------------------------------------
*/
除了用 @
符号指代数字,wide 转 long 时,还可以在对 j 的设定中为新生成的变量指定数字。
use "$rootcardio_wide3.dta", clear
clist bp* pl*, noobs
/*
bp1 bp2 bp3 bp4 bp5 bp2005 pl1 pl2 pl3 pl4 pl5 pl2005
115 86 129 105 127 112 54 87 93 81 92 81
123 136 107 111 120 119 92 88 125 87 58 90
124 122 101 109 112 113 105 97 128 57 68 91
105 115 121 129 137 121 52 79 71 106 39 69
116 128 112 125 111 118 70 64 52 68 59 62
108 126 124 131 107 119 74 78 92 99 80 84
*/
reshape long bp pl, i(id) j(trial_num)
list, sepby(id)
/*
---------------------------------
| id trial_~m age bp pl |
|---------------------------------|
1. | 1 1 40 115 54 |
2. | 1 2 40 86 87 |
3. | 1 3 40 129 93 |
4. | 1 4 40 105 81 |
5. | 1 5 40 127 92 |
6. | 1 2005 40 112 81 |
|---------------------------------|
7. | 2 1 30 123 92 |
8. | 2 2 30 136 88 |
9. | 2 3 30 107 125 |
10. | 2 4 30 111 87 |
11. | 2 5 30 120 58 |
12. | 2 2005 30 119 90 |
|---------------------------------|
13. | 3 1 16 124 105 |
14. | 3 2 16 122 97 |
15. | 3 3 16 101 128 |
16. | 3 4 16 109 57 |
17. | 3 5 16 112 68 |
18. | 3 2005 16 113 91 |
|---------------------------------|
19. | 4 1 23 105 52 |
20. | 4 2 23 115 79 |
21. | 4 3 23 121 71 |
22. | 4 4 23 129 106 |
23. | 4 5 23 137 39 |
24. | 4 2005 23 121 69 |
|---------------------------------|
25. | 5 1 18 116 70 |
26. | 5 2 18 128 64 |
27. | 5 3 18 112 52 |
28. | 5 4 18 125 68 |
29. | 5 5 18 111 59 |
30. | 5 2005 18 118 62 |
|---------------------------------|
31. | 6 1 27 108 74 |
32. | 6 2 27 126 78 |
33. | 6 3 27 124 92 |
34. | 6 4 27 131 99 |
35. | 6 5 27 107 80 |
36. | 6 2005 27 119 84 |
---------------------------------
*/
若数据集中 bp2005
和 pl2005
是一个同 age
一样的固定变量,此时除了用 rename
命令改名之外,还可以于转换时在 j 中设定转换后的具体数值。
use "$rootcardio_wide3.dta", clear
reshape long bp pl, i(id) j(trial_num 1-5)
list, sepby(id)
/*
---------------------------------------------------
| id trial_~m age bp bp2005 pl pl2005 |
|---------------------------------------------------|
1. | 1 1 40 115 112 54 81 |
2. | 1 2 40 86 112 87 81 |
3. | 1 3 40 129 112 93 81 |
4. | 1 4 40 105 112 81 81 |
5. | 1 5 40 127 112 92 81 |
|---------------------------------------------------|
6. | 2 1 30 123 119 92 90 |
7. | 2 2 30 136 119 88 90 |
8. | 2 3 30 107 119 125 90 |
9. | 2 4 30 111 119 87 90 |
10. | 2 5 30 120 119 58 90 |
|---------------------------------------------------|
11. | 3 1 16 124 113 105 91 |
12. | 3 2 16 122 113 97 91 |
13. | 3 3 16 101 113 128 91 |
14. | 3 4 16 109 113 57 91 |
15. | 3 5 16 112 113 68 91 |
|---------------------------------------------------|
16. | 4 1 23 105 121 52 69 |
17. | 4 2 23 115 121 79 69 |
18. | 4 3 23 121 121 71 69 |
19. | 4 4 23 129 121 106 69 |
20. | 4 5 23 137 121 39 69 |
|---------------------------------------------------|
21. | 5 1 18 116 118 70 62 |
22. | 5 2 18 128 118 64 62 |
23. | 5 3 18 112 118 52 62 |
24. | 5 4 18 125 118 68 62 |
25. | 5 5 18 111 118 59 62 |
|---------------------------------------------------|
26. | 6 1 27 108 119 74 84 |
27. | 6 2 27 126 119 78 84 |
28. | 6 3 27 124 119 92 84 |
29. | 6 4 27 131 119 99 84 |
30. | 6 5 27 107 119 80 84 |
---------------------------------------------------
*/
示例 8:j 为非数值时的转换
代码语言:javascript复制use "https://www.stata-press.com/data/r17/reshape4", clear
list
/*
-------------------------
| id kids incm incf |
|-------------------------|
1. | 1 0 5000 5500 |
2. | 2 1 2000 2200 |
3. | 3 2 3000 2000 |
-------------------------
*/
reshape long inc, i(id) j(gender) string //string option specifies that j take on non-numeric values
list, sepby(id) //gender 为字符型变量
/*
------------------------
| id gender kids inc |
|------------------------|
1. | 1 f 0 5500 |
2. | 1 m 0 5000 |
|------------------------|
3. | 2 f 1 2200 |
4. | 2 m 1 2000 |
|------------------------|
5. | 3 f 2 2000 |
6. | 3 m 2 3000 |
------------------------
*/
reshape wide inc, i(id) j(gender) string
list
/*
-------------------------
| id incf incm kids |
|-------------------------|
1. | 1 5500 5000 0 |
2. | 2 2200 2000 1 |
3. | 3 2000 3000 2 |
-------------------------
*/
当j
为字符时,转换并不要求如incm
和incf
这样有特定的字符长度,可以用@
符号进行指代。
use "https://www.stata-press.com/data/r17/reshapexp2", clear
list
/*
------------------------------
| id kids incmale incfem |
|------------------------------|
1. | 1 0 5000 5500 |
2. | 2 1 2000 2200 |
3. | 3 2 3000 2000 |
------------------------------
*/
reshape long inc@, i(id) j(gender) string
list, sepby(id)
/*
---------------------------
| id gender kids inc |
|---------------------------|
1. | 1 fem 0 5500 |
2. | 1 male 0 5000 |
|---------------------------|
3. | 2 fem 1 2200 |
4. | 2 male 1 2000 |
|---------------------------|
5. | 3 fem 2 2000 |
6. | 3 male 2 3000 |
---------------------------
*/
示例 9: 多层级数据集
代码语言:javascript复制use "$rootweights_long2.dta", clear
list, sepby(id) // information at two different levels, daily measurements within Persons
/*
--------------------------------------------
| id female age race ed days wt |
|--------------------------------------------|
1. | 1 1 22 1 9 7 166 |
2. | 1 1 22 1 9 14 163 |
3. | 1 1 22 1 9 21 164 |
4. | 1 1 22 1 9 28 162 |
|--------------------------------------------|
5. | 2 0 43 2 13 9 188 |
6. | 2 0 43 2 13 13 184 |
7. | 2 0 43 2 13 22 185 |
8. | 2 0 43 2 13 27 182 |
|--------------------------------------------|
9. | 3 0 63 3 11 6 158 |
10. | 3 0 63 3 11 12 155 |
11. | 3 0 63 3 11 31 157 |
|--------------------------------------------|
12. | 4 1 26 2 15 8 192 |
13. | 4 1 26 2 15 17 190 |
14. | 4 1 26 2 15 22 191 |
15. | 4 1 26 2 15 30 193 |
|--------------------------------------------|
16. | 5 1 29 1 12 5 145 |
17. | 5 1 29 1 12 11 142 |
18. | 5 1 29 1 12 20 140 |
19. | 5 1 29 1 12 26 137 |
--------------------------------------------
*/
*多层级数据其实是两个 level 数据集的合并
*a level-1 (time) dataset
use "$rootweights_level1.dta", clear
list, sepby(id)
/*
-----------------
| id days wt |
|-----------------|
1. | 1 7 166 |
2. | 1 14 163 |
3. | 1 21 164 |
4. | 1 28 162 |
|-----------------|
5. | 2 9 188 |
6. | 2 13 184 |
7. | 2 22 185 |
8. | 2 27 182 |
|-----------------|
9. | 3 6 158 |
10. | 3 12 155 |
11. | 3 31 157 |
|-----------------|
12. | 4 8 192 |
13. | 4 17 190 |
14. | 4 22 191 |
15. | 4 30 193 |
|-----------------|
16. | 5 5 145 |
17. | 5 11 142 |
18. | 5 20 140 |
19. | 5 26 137 |
-----------------
*/
*a level-2 (person) dataset
use "$rootweights_level2.dta", clear
list, sepby(id)
/*
-------------------------------
| id female age race ed |
|-------------------------------|
1. | 1 1 22 1 9 |
|-------------------------------|
2. | 2 0 43 2 13 |
|-------------------------------|
3. | 3 0 63 3 11 |
|-------------------------------|
4. | 4 1 26 2 15 |
|-------------------------------|
5. | 5 1 29 1 12 |
-------------------------------
*/
*合并数据集
merge 1:m id using "$rootweights_level1.dta", generate(merge)
sort id days
list, sepby(id)
/*
----------------------------------------------------------
| id female age race ed days wt merge |
|----------------------------------------------------------|
1. | 1 1 22 1 9 7 166 Matched (3) |
2. | 1 1 22 1 9 14 163 Matched (3) |
3. | 1 1 22 1 9 21 164 Matched (3) |
4. | 1 1 22 1 9 28 162 Matched (3) |
|----------------------------------------------------------|
5. | 2 0 43 2 13 9 188 Matched (3) |
6. | 2 0 43 2 13 13 184 Matched (3) |
7. | 2 0 43 2 13 22 185 Matched (3) |
8. | 2 0 43 2 13 27 182 Matched (3) |
|----------------------------------------------------------|
9. | 3 0 63 3 11 6 158 Matched (3) |
10. | 3 0 63 3 11 12 155 Matched (3) |
11. | 3 0 63 3 11 31 157 Matched (3) |
|----------------------------------------------------------|
12. | 4 1 26 2 15 8 192 Matched (3) |
13. | 4 1 26 2 15 17 190 Matched (3) |
14. | 4 1 26 2 15 22 191 Matched (3) |
15. | 4 1 26 2 15 30 193 Matched (3) |
|----------------------------------------------------------|
16. | 5 1 29 1 12 5 145 Matched (3) |
17. | 5 1 29 1 12 11 142 Matched (3) |
18. | 5 1 29 1 12 20 140 Matched (3) |
19. | 5 1 29 1 12 26 137 Matched (3) |
----------------------------------------------------------
*/
*可用于多层次回归分析
xtmixed wt days age || id: , noheader nolog
/*
------------------------------------------------------------------------------
wt | Coefficient Std. err. z P>|z| [95% conf. interval]
------------- ----------------------------------------------------------------
days | -.1294397 .0518173 -2.50 0.012 -.2309997 -.0278797
age | -.1586783 .5521687 -0.29 0.774 -1.240909 .9235525
_cons | 175.5771 21.85316 8.03 0.000 132.7457 218.4085
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects parameters | Estimate Std. err. [95% conf. interval]
----------------------------- ------------------------------------------------
id: Identity |
sd(_cons) | 18.46033 5.853122 9.91638 34.36573
----------------------------- ------------------------------------------------
sd(Residual) | 1.871344 .3536388 1.292103 2.710255
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 58.04 Prob >= chibar2 = 0.0000
*/
示例 10: long 形数据的进一步处理
使用 collapse
命令计算统计量
代码语言:javascript复制*计算均值(默认)
use "$rootcardio_long.dta", clear
list, sepby(id)
/*
------------------------------
| id trial age bp pl |
|------------------------------|
1. | 1 1 40 115 54 |
2. | 1 2 40 86 87 |
3. | 1 3 40 129 93 |
4. | 1 4 40 105 81 |
5. | 1 5 40 127 92 |
|------------------------------|
6. | 2 1 30 123 92 |
7. | 2 2 30 136 88 |
8. | 2 3 30 107 125 |
9. | 2 4 30 111 87 |
10. | 2 5 30 120 58 |
|------------------------------|
11. | 3 1 16 124 105 |
12. | 3 2 16 122 97 |
13. | 3 3 16 101 128 |
14. | 3 4 16 109 57 |
15. | 3 5 16 112 68 |
|------------------------------|
16. | 4 1 23 105 52 |
17. | 4 2 23 115 79 |
18. | 4 3 23 121 71 |
19. | 4 4 23 129 106 |
20. | 4 5 23 137 39 |
|------------------------------|
21. | 5 1 18 116 70 |
22. | 5 2 18 128 64 |
23. | 5 3 18 112 52 |
24. | 5 4 18 125 68 |
25. | 5 5 18 111 59 |
|------------------------------|
26. | 6 1 27 108 74 |
27. | 6 2 27 126 78 |
28. | 6 3 27 124 92 |
29. | 6 4 27 131 99 |
30. | 6 5 27 107 80 |
------------------------------
*/
collapse bp pl, by(id)
list, sep(0)
/*
---------------
| id bp pl |
|---------------|
1. | 1 112 81 |
2. | 2 119 90 |
3. | 3 114 91 |
4. | 4 121 69 |
5. | 5 118 63 |
6. | 6 119 85 |
---------------
*/
*计算其他统计量 //supports many other statistics beyond mean, sd, min, max
use "$rootcardio_long.dta", clear
collapse (min) bpmin=bp plmin=pl (max) bpmax=bp plmax=pl, by(id)
list, sep(0)
/*
------------------------------------
| id bpmin plmin bpmax plmax |
|------------------------------------|
1. | 1 112 81 112 81 |
2. | 2 119 90 119 90 |
3. | 3 114 91 114 91 |
4. | 4 121 69 121 69 |
5. | 5 118 63 118 63 |
6. | 6 119 85 119 85 |
------------------------------------
*/
use "$rootcardio_long.dta", clear
collapse (mean) bpmean=bp plmean=pl (sd) bpsd=bp plsd=pl, by(id)
list, sep(0)
/*
------------------------------------
| id bpmean plmean bpsd plsd |
|------------------------------------|
1. | 1 112 81 18 16 |
2. | 2 119 90 11 24 |
3. | 3 114 91 10 29 |
4. | 4 121 69 12 26 |
5. | 5 118 63 8 7 |
6. | 6 119 85 11 10 |
------------------------------------
*/
计算组内滞后项差值
代码语言:javascript复制use "$rootcardio_long.dta", clear
sort id trial
list, sepby(id)
/*
------------------------------
| id trial age bp pl |
|------------------------------|
1. | 1 1 40 115 54 |
2. | 1 2 40 86 87 |
3. | 1 3 40 129 93 |
4. | 1 4 40 105 81 |
5. | 1 5 40 127 92 |
|------------------------------|
6. | 2 1 30 123 92 |
7. | 2 2 30 136 88 |
8. | 2 3 30 107 125 |
9. | 2 4 30 111 87 |
10. | 2 5 30 120 58 |
|------------------------------|
11. | 3 1 16 124 105 |
12. | 3 2 16 122 97 |
13. | 3 3 16 101 128 |
14. | 3 4 16 109 57 |
15. | 3 5 16 112 68 |
|------------------------------|
16. | 4 1 23 105 52 |
17. | 4 2 23 115 79 |
18. | 4 3 23 121 71 |
19. | 4 4 23 129 106 |
20. | 4 5 23 137 39 |
|------------------------------|
21. | 5 1 18 116 70 |
22. | 5 2 18 128 64 |
23. | 5 3 18 112 52 |
24. | 5 4 18 125 68 |
25. | 5 5 18 111 59 |
|------------------------------|
26. | 6 1 27 108 74 |
27. | 6 2 27 126 78 |
28. | 6 3 27 124 92 |
29. | 6 4 27 131 99 |
30. | 6 5 27 107 80 |
------------------------------
*/
by id: generate pldiff1 = pl - pl[_n-1] //minus the previous
by id: generate pldiff2 = pl - pl[_n-2]
list, sepby(id)
/*
--------------------------------------------------
| id trial age bp pl pldiff1 pldiff2 |
|--------------------------------------------------|
1. | 1 1 40 115 54 . . |
2. | 1 2 40 86 87 33 . |
3. | 1 3 40 129 93 6 39 |
4. | 1 4 40 105 81 -12 -6 |
5. | 1 5 40 127 92 11 -1 |
|--------------------------------------------------|
6. | 2 1 30 123 92 . . |
7. | 2 2 30 136 88 -4 . |
8. | 2 3 30 107 125 37 33 |
9. | 2 4 30 111 87 -38 -1 |
10. | 2 5 30 120 58 -29 -67 |
|--------------------------------------------------|
11. | 3 1 16 124 105 . . |
12. | 3 2 16 122 97 -8 . |
13. | 3 3 16 101 128 31 23 |
14. | 3 4 16 109 57 -71 -40 |
15. | 3 5 16 112 68 11 -60 |
|--------------------------------------------------|
16. | 4 1 23 105 52 . . |
17. | 4 2 23 115 79 27 . |
18. | 4 3 23 121 71 -8 19 |
19. | 4 4 23 129 106 35 27 |
20. | 4 5 23 137 39 -67 -32 |
|--------------------------------------------------|
21. | 5 1 18 116 70 . . |
22. | 5 2 18 128 64 -6 . |
23. | 5 3 18 112 52 -12 -18 |
24. | 5 4 18 125 68 16 4 |
25. | 5 5 18 111 59 -9 7 |
|--------------------------------------------------|
26. | 6 1 27 108 74 . . |
27. | 6 2 27 126 78 4 . |
28. | 6 3 27 124 92 14 18 |
29. | 6 4 27 131 99 7 21 |
30. | 6 5 27 107 80 -19 -12 |
--------------------------------------------------
*/
以上 10 例是对 reshape
命令的介绍。正如我们看到的那样,有时发生错误会报错,但有时不会,这便需要我们对数据形态有一个充分的把握。此外,数据处理的过程除了耐心和细心,技术上看,还体现了对众多命令工具的组合能力以及对单一命令的扩展能力。
鉴于水平有限,如有错误还请读者不吝指教!
补充材料:
版本信息:
2023-10-04(初稿)