行转列是一种常见的数据处理操作,所以对如何在 SQL 语句中、如何在 pandas 中实现这种行转列做过一些总结。请参考我之前写的博文:
- pandas 行转列一种典型输出报表的解决方法
- 使用 Access 查询设计器轻松构造复杂 SQL 语句
- 使用 Access 查询设计器轻松构造复杂 SQL 语句 (2)
今天介绍在 Power Query (PQ) 实现行转列的操作方法,顺便说是稍微复杂一点的 IF 语句如何编写。
新建一个空查询,导入我已经放在 github 上的示例数据:
代码语言:txt复制let
source = Csv.Document(
Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas sample data/project-listing.csv"),
[Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None])
in
source
可以在高级编辑器中拷贝以上代码,或者根据 url 从 Web 导入数据,因为数据是 csv 格式,所以 PQ 用 Csv.Document()
函数来读取数据的内容。
将第一行作为标题,PQ 自动识别数据类型:
因为公式栏和高级编辑器对中文的支持不好,建议将所有步骤改成英文,步骤名称最好不要有空格。
这一步骤完成后,进入高级编辑器,M 脚本代码如下:
代码语言:txt复制let
source = Csv.Document(
Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas sample data/project-listing.csv"),
[Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None]),
firstLineAsTitle = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
changedTitle = Table.TransformColumnTypes(firstLineAsTitle,
{{"ID", Int64.Type},
{"YEAR", Int64.Type},
{"MONTH", Int64.Type},
{"ENTITY", Int64.Type},
{"ACCOUNT", type text},
{"DIRECTION", type text},
{"CURRENCY", type text},
{"PROJ_TYPE", type text},
{"PROJ_CODE", type text},
{"AMOUNT", type number}})
in
changedTitle
注意 PQ 将 ACCOUNT
字段识别为 Int64.Type
,我改为了 text
。
接下来增加条件列来进行行转列的操作,说说规则:
代码语言:txt复制- BEGIN_COST (年初成本: 科目为 140401,期间 < 2018 年
- BEGIN_VAR (年初成本变化额:科目为 140404,期间 < 2018 年
- PER_COST_ADD (期间成本增加:科目为 140401,方向为 S, Year = 2018, Month <= 4
- PER_VAR_ADD (期间公允价值增加:科目为 140404,方向为 S, Year = 2018, Month <= 4
- PER_COST_DECT (期间成本减少:科目为 140401,方向为 H, Year = 2018, Month <= 4
- PER_VAR_ADD (期间公允价值减少:科目为 140404,方向为 H, Year = 2018, Month <= 4
增加一个条件列,我们可以看到,PQ 并不能表达 [ACCOUNT] = "140401" and [YEAR] < 2018
,对话框中只能根据某一个列来设置条件:
我们需要点击确定后,在公式栏或高级编辑器中编写条件表达式:
选中步骤 addedBeginCost,右键,在后面插入一个步骤
因为步骤的公式类似,可以从上一步骤的公式拷贝过来进行改写:
用类似的方法添加步骤,计算出其他栏位。完成后 M 语言脚本如下:
代码语言:txt复制let
source = Csv.Document(
Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas sample data/project-listing.csv"),
[Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None]),
firstLineAsTitle = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
changedTitle = Table.TransformColumnTypes(firstLineAsTitle,
{{"ID", Int64.Type},
{"YEAR", Int64.Type},
{"MONTH", Int64.Type},
{"ENTITY", Int64.Type},
{"ACCOUNT", type text},
{"DIRECTION", type text},
{"CURRENCY", type text},
{"PROJ_TYPE", type text},
{"PROJ_CODE", type text},
{"AMOUNT", type number}}),
addedBeginCost = Table.AddColumn(changedTitle, "BEGIN_COST", each if [ACCOUNT] = "140401" and [YEAR] < 2018 then [AMOUNT] else 0),
addedBeginVar = Table.AddColumn(addedBeginCost, "BEGIN_VAR", each if [ACCOUNT] = "140404" and [YEAR] < 2018 then [AMOUNT] else 0),
addedPerCostAdd = Table.AddColumn(
addedBeginVar, "PER_COST_ADD", each if [ACCOUNT] = "140401" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0),
addedPerVarAdd = Table.AddColumn(
addedPerCostAdd, "PER_VAR_ADD", each if [ACCOUNT] = "140404" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0),
addedPerCostDect = Table.AddColumn(
addedPerVarAdd, "PER_COST_DECT",each if [ACCOUNT] = "140401" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0),
addedPerVarDect = Table.AddColumn(
addedPerCostDect, "PER_VAR_DECT", each if [ACCOUNT] = "140404" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0)
in
addedPerVarDect
根据 PROJ_CODE
进行分组:
完成之后数据格式如下:
完成之后 M 语言脚本如下:
代码语言:txt复制let
source = Csv.Document(
Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas sample data/project-listing.csv"),
[Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None]),
firstLineAsTitle = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
筛选的行 = Table.SelectRows(firstLineAsTitle, each true),
changedTitle = Table.TransformColumnTypes(筛选的行,
{{"ID", Int64.Type},
{"YEAR", Int64.Type},
{"MONTH", Int64.Type},
{"ENTITY", Int64.Type},
{"ACCOUNT", type text},
{"DIRECTION", type text},
{"CURRENCY", type text},
{"PROJ_TYPE", type text},
{"PROJ_CODE", type text},
{"AMOUNT", type number}}),
addedBeginCost = Table.AddColumn(changedTitle, "BEGIN_COST", each if [ACCOUNT] = "140401" and [YEAR] < 2018 then [AMOUNT] else 0),
addedBeginVar = Table.AddColumn(addedBeginCost, "BEGIN_VAR", each if [ACCOUNT] = "140404" and [YEAR] < 2018 then [AMOUNT] else 0),
addedPerCostAdd = Table.AddColumn(
addedBeginVar, "PER_COST_ADD", each if [ACCOUNT] = "140401" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
addedPerVarAdd = Table.AddColumn(
addedPerCostAdd, "PER_VAR_ADD", each if [ACCOUNT] = "140404" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
addedPerCostDect = Table.AddColumn(
addedPerVarAdd, "PER_COST_DECT",each if [ACCOUNT] = "140401" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
addedPerVarDect = Table.AddColumn(
addedPerCostDect, "PER_VAR_DECT", each if [ACCOUNT] = "140404" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
grouped = Table.Group(addedPerVarDect, {"PROJ_CODE"}, {
{"BEGIN_COST", each List.Sum([BEGIN_COST]), type number},
{"BEGIN_VAR", each List.Sum([BEGIN_VAR]), type number},
{"PER_COST_ADD", each List.Sum([PER_COST_ADD]), type number},
{"PER_VAR_ADD", each List.Sum([PER_VAR_ADD]), type number},
{"PER_COST_DECT", each List.Sum([PER_COST_DECT]), type number},
{"PER_VAR_DECT", each List.Sum([PER_VAR_DECT]), type number}
}),
sortedLines = Table.Sort(grouped,{{"PROJ_CODE", Order.Ascending}})
in
sortedLines
本示例贴了较多代码,作用 1 是结合操作步骤观察和理解 M 代码,作用二是可以直接将代码粘贴到高级编辑器,生成查询,直接学习。
Table.Group()
函数语法:
Table.Group(table as table,
key as any,
aggregatedColumns as list,
optional groupKind as nullable number,
optional comparer as nullable function) as table