Power Query 系列 (08) - 行转列案例

2021-03-25 09:45:44 浏览数 (1)

行转列是一种常见的数据处理操作,所以对如何在 SQL 语句中、如何在 pandas 中实现这种行转列做过一些总结。请参考我之前写的博文:

  • pandas 行转列一种典型输出报表的解决方法
  • 使用 Access 查询设计器轻松构造复杂 SQL 语句
  • 使用 Access 查询设计器轻松构造复杂 SQL 语句 (2)

今天介绍在 Power Query (PQ) 实现行转列的操作方法,顺便说是稍微复杂一点的 IF 语句如何编写。

新建一个空查询,导入我已经放在 github 上的示例数据:

代码语言:txt复制
let
    source = Csv.Document(
    	Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas sample data/project-listing.csv"),
   	 [Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None])
in
    source

可以在高级编辑器中拷贝以上代码,或者根据 url 从 Web 导入数据,因为数据是 csv 格式,所以 PQ 用 Csv.Document() 函数来读取数据的内容。

将第一行作为标题,PQ 自动识别数据类型:

因为公式栏高级编辑器对中文的支持不好,建议将所有步骤改成英文,步骤名称最好不要有空格。

这一步骤完成后,进入高级编辑器,M 脚本代码如下:

代码语言:txt复制
let
    source = Csv.Document(
    	Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas sample data/project-listing.csv"),
   	 [Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    firstLineAsTitle = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    changedTitle = Table.TransformColumnTypes(firstLineAsTitle,
        {{"ID", Int64.Type}, 
        {"YEAR", Int64.Type}, 
        {"MONTH", Int64.Type}, 
        {"ENTITY", Int64.Type}, 
        {"ACCOUNT", type text}, 
        {"DIRECTION", type text}, 
        {"CURRENCY", type text}, 
        {"PROJ_TYPE", type text}, 
        {"PROJ_CODE", type text}, 
        {"AMOUNT", type number}})
in
    changedTitle

注意 PQ 将 ACCOUNT 字段识别为 Int64.Type,我改为了 text

接下来增加条件列来进行行转列的操作,说说规则:

代码语言:txt复制
- BEGIN_COST (年初成本: 科目为 140401,期间  < 2018 年
- BEGIN_VAR (年初成本变化额:科目为 140404,期间 < 2018 年
- PER_COST_ADD (期间成本增加:科目为 140401,方向为 S, Year = 2018, Month <= 4
- PER_VAR_ADD (期间公允价值增加:科目为 140404,方向为 S, Year = 2018, Month <= 4
- PER_COST_DECT (期间成本减少:科目为 140401,方向为 H, Year = 2018, Month <= 4
- PER_VAR_ADD (期间公允价值减少:科目为 140404,方向为 H, Year = 2018, Month <= 4

增加一个条件列,我们可以看到,PQ 并不能表达 [ACCOUNT] = "140401" and [YEAR] < 2018,对话框中只能根据某一个列来设置条件:

我们需要点击确定后,在公式栏或高级编辑器中编写条件表达式:

选中步骤 addedBeginCost,右键,在后面插入一个步骤

因为步骤的公式类似,可以从上一步骤的公式拷贝过来进行改写:

用类似的方法添加步骤,计算出其他栏位。完成后 M 语言脚本如下:

代码语言:txt复制
let
    source = Csv.Document(
    	Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas sample data/project-listing.csv"),
   	 [Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    firstLineAsTitle = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    changedTitle = Table.TransformColumnTypes(firstLineAsTitle,
        {{"ID", Int64.Type}, 
        {"YEAR", Int64.Type}, 
        {"MONTH", Int64.Type}, 
        {"ENTITY", Int64.Type}, 
        {"ACCOUNT", type text}, 
        {"DIRECTION", type text}, 
        {"CURRENCY", type text}, 
        {"PROJ_TYPE", type text}, 
        {"PROJ_CODE", type text}, 
        {"AMOUNT", type number}}),
    
    addedBeginCost = Table.AddColumn(changedTitle, "BEGIN_COST", each if [ACCOUNT] = "140401" and [YEAR] < 2018 then [AMOUNT] else 0),
    addedBeginVar = Table.AddColumn(addedBeginCost, "BEGIN_VAR", each if [ACCOUNT] = "140404" and [YEAR] < 2018 then [AMOUNT] else 0),
    
    addedPerCostAdd = Table.AddColumn(
        addedBeginVar, "PER_COST_ADD", each if [ACCOUNT] = "140401" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0),
    
    addedPerVarAdd = Table.AddColumn(
        addedPerCostAdd, "PER_VAR_ADD", each if [ACCOUNT] = "140404" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0),
    
    addedPerCostDect = Table.AddColumn(
        addedPerVarAdd, "PER_COST_DECT",each if [ACCOUNT] = "140401" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0),
    
    addedPerVarDect = Table.AddColumn(
        addedPerCostDect, "PER_VAR_DECT", each if [ACCOUNT] = "140404" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0)
in
    addedPerVarDect

根据 PROJ_CODE 进行分组:

完成之后数据格式如下:

完成之后 M 语言脚本如下:

代码语言:txt复制
let
    source = Csv.Document(
    	Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas sample data/project-listing.csv"),
   	 [Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    firstLineAsTitle = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    筛选的行 = Table.SelectRows(firstLineAsTitle, each true),
    changedTitle = Table.TransformColumnTypes(筛选的行,
        {{"ID", Int64.Type}, 
        {"YEAR", Int64.Type}, 
        {"MONTH", Int64.Type}, 
        {"ENTITY", Int64.Type}, 
        {"ACCOUNT", type text}, 
        {"DIRECTION", type text}, 
        {"CURRENCY", type text}, 
        {"PROJ_TYPE", type text}, 
        {"PROJ_CODE", type text}, 
        {"AMOUNT", type number}}),
    
    addedBeginCost = Table.AddColumn(changedTitle, "BEGIN_COST", each if [ACCOUNT] = "140401" and [YEAR] < 2018 then [AMOUNT] else 0),
    addedBeginVar = Table.AddColumn(addedBeginCost, "BEGIN_VAR", each if [ACCOUNT] = "140404" and [YEAR] < 2018 then [AMOUNT] else 0),
    
    addedPerCostAdd = Table.AddColumn(
        addedBeginVar, "PER_COST_ADD", each if [ACCOUNT] = "140401" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
    
    addedPerVarAdd = Table.AddColumn(
        addedPerCostAdd, "PER_VAR_ADD", each if [ACCOUNT] = "140404" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
    
    addedPerCostDect = Table.AddColumn(
        addedPerVarAdd, "PER_COST_DECT",each if [ACCOUNT] = "140401" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
    
    addedPerVarDect = Table.AddColumn(
        addedPerCostDect, "PER_VAR_DECT", each if [ACCOUNT] = "140404" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
    
    grouped = Table.Group(addedPerVarDect, {"PROJ_CODE"}, {
            {"BEGIN_COST", each List.Sum([BEGIN_COST]), type number}, 
            {"BEGIN_VAR", each List.Sum([BEGIN_VAR]), type number}, 
            {"PER_COST_ADD", each List.Sum([PER_COST_ADD]), type number}, 
            {"PER_VAR_ADD", each List.Sum([PER_VAR_ADD]), type number}, 
            {"PER_COST_DECT", each List.Sum([PER_COST_DECT]), type number}, 
            {"PER_VAR_DECT", each List.Sum([PER_VAR_DECT]), type number}
        }),
        
    sortedLines = Table.Sort(grouped,{{"PROJ_CODE", Order.Ascending}})
in
    sortedLines

本示例贴了较多代码,作用 1 是结合操作步骤观察和理解 M 代码,作用二是可以直接将代码粘贴到高级编辑器,生成查询,直接学习。

Table.Group() 函数语法:

代码语言:txt复制
Table.Group(table as table, 
            key as any, 
            aggregatedColumns as list, 
            optional groupKind as nullable number, 
            optional comparer as nullable function) as table

0 人点赞