目录
1. 知识图谱(KG)
2. 知识表示
3. RDF
4. RDFS(RDF Schema)
5. OWL
6. SPARQL
7. RDB2RDF
8. D2RQ
9. 知识图谱存储方案
10. Protege
1. 知识图谱(KG)
- 知识图谱是一种用图模型来描述知识和建模世界万物之间的关联关系的技术方法。
- 知识图谱由节点和边组成。
- 节点可以是实体(例如:一个人、一本书)
- 边可以使实体的属性(例如:姓名、书名)或是实体之间的关系(例如:朋友、配偶)。
- 知识图谱由节点和边组成。
- 知识图谱旨在从数据中识别、发现和推断事物与概念之间的复杂关系,是事物关系的可计算模型。
- 知识图谱发展史
- 语义网络(Semantic Networks):语义网络是由Quillian于上世纪60年代提出的知识表达模式,采用相互连接的节点和边来表示知识,节点表示对象、概念,边表示节点之间的关系。
- 本体论(Ontology):本体论(Ontology)一词是从哲学领域引入到了计算机科学领域,用来刻画知识。其核心意思是指一种模型,用于描述由一套对象类型(概念或者说类)、属性以及关系类型所构成的世界。AI研究人员认为,他们可以把本体创建成为计算模型,从而成就特定类型的自动推理。
- 万维网(WWW):1989 年 Time Berners-Lee 发明了万维网,实现了以链接为中心的信息系统。任何人都可以通过添加链接把自己的文档链入其中。
- 语义互联网(Semantic Web):1994年 Time Berners-Lee 又提出 Web 不应该仅仅只是网页之间的相互链接。于 1998 年提出了 Semantic Web 的概念。语义互联网的本质是数据的互联网(Web of Data)或事物的互联网(Web of Things)。
2. 知识表示
- 知识表示是指用计算机符号描述和表示人脑中的知识,以支持机器模拟人的心智进行推理的方法与技术。
- 人工智能早期的知识表示方法:
- 一阶谓词逻辑(First Order Predicate)
- 霍恩子句和霍恩逻辑(Horn Clause)
- 语义网络(Semantic Network)
- 框架表示法(Framework)
- 描述逻辑(Description Logic )
- 产生式系统(Production system)
- 互联网时代的语义网知识表示框架
- RDF、RDFS
- OWL、OWL2 Fragments
3. RDF
- RDF 是 W3C 的 RDF 工作组制定的关于知识图谱的国际标准。
- The Resource Description Framework (RDF) is a standard (technically a W3C Recommendation) for describing resources.
- RDF is the foundation of the Semantic Web and what provides its innate flexibility. All data in the Semantic Web is represented in RDF, including schema describing RDF data.
- RDF is not like the tabular data model of relational databases. Nor is it like the trees of the XML world. Instead, RDF is a graph.
图3-1:W3C 语义网 标准栈
- RDF由节点和边组成,节点表示实体/资源、属性,边则表示了实体和实体之间的关系以及实体和属性的关系。
- 在RDF中,知识总是以三元组的形式出现,即每一份知识都可以被分解为:(subject, predicate, object)。
- RDF三元组可以看做是图模型的边和顶点(vertex,edge,vertex)。
- RDF 的序列化方法(RDF是抽象的数据模型,支持不同的序列化格式)
- RDF/XML
- N-Triples
- Turtle
- RDFa
- JSON-LD
4. RDFS(RDF Schema)
- RDF Schema (RDFS) 是对 RDF 的一种扩展。
- RDF是对具体事物的描述,缺乏抽象能力,无法对同一个类别的事物进行定义和描述。
- RDFS本质上是RDF词汇的一个扩展。
- RDF Schema provides a data-modelling vocabulary for RDF data. RDF Schema is an extension of the basic RDF vocabulary.
- RDFS在RDF的基础上定义了类(class)、属性(property)以及关系(relation)来描述资源,并且通过属性的定义域(domain)和值域(range)来约束资源。RDFS在数据层(data)的基础上引入了模式层(schema),模式层定义了一种约束规则,而数据层是在这种规则下的一个实例填充。
- RDFS is RDF! RDFS is expressed as RDF!
- RDF is a graph database. RDFS, on the other hand, is object oriented in its nature.
RDFS 示例:
代码语言:javascript复制<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xml:base= "http://www.animals.fake/animals#">
<rdfs:Class rdf:ID="animal" />
<rdfs:Class rdf:ID="horse">
<rdfs:subClassOf rdf:resource="#animal"/>
</rdfs:Class>
</rdf:RDF>
5. OWL
通过RDF(S)可以表达一些简单的语义,但在更复杂的场景下,RDF(S)语义表达能力显得太弱,还缺少诸多常用的特征。包括对局部值域的属性定义,类、属性、个体的等价性,不相交类的定义,基数约束,关于属性特征的描述等。因此W3C提出了OWL语言扩展RDF(S),作为语义网上表示本体的推荐语言。
The Semantic Web is a vision for the future of the Web in which information is given explicit meaning, making it easier for machines to automatically process and integrate information available on the Web.
The Semantic Web will build on XML's ability to define customized tagging schemes and RDF's flexible approach to representing data. The first level above RDF required for the Semantic Web is an ontology language what can formally describe the meaning of terminology used in Web documents. If machines are expected to perform useful reasoning tasks on these documents, the language must go beyond the basic semantics of RDF Schema. The OWL Use Cases and Requirements Document provides more details on ontologies, motivates the need for a Web Ontology Language in terms of six use cases, and formulates design goals, requirements and objectives for OWL.
OWL has been designed to meet this need for a Web Ontology Language. OWL is part of the growing stack of W3C recommendations related to the Semantic Web.
- XML provides a surface syntax for structured documents, but imposes no semantic constraints on the meaning of these documents.
- XML Schema is a language for restricting the structure of XML documents and also extends XML with datatypes.
- RDF is a datamodel for objects ("resources") and relations between them, provides a simple semantics for this datamodel, and these datamodels can be represented in an XML syntax.
- RDF Schema is a vocabulary for describing properties and classes of RDF resources, with a semantics for generalization-hierarchies of such properties and classes.
- OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
6. SPARQL
SPARQL即SPARQL Protocol and RDF Query Language的递归缩写,被专门设计用来访问和操作RDF数据,是语义网的核心技术之一。W3C的RDF数据存取小组(RDF Data Access Working Group, RDAWG)对其进行了标准化。2008年1月15日,SPARQL正式成为一项W3C推荐标准。
SPARQL, a query language for RDF, can join data from different databases, as well as documents, inference engines, or anything else that might express its knowledge as a directed labeled graph.
7. RDB2RDF
知识图谱数据的来源主要有三个:结构化数据、半结构化数据和非结构化的数据。
官方 RDB2RDF 标准:
W3C 的 RDB2RDF 工作小组制定的两个标准,用于将关系型数据库的数据转换为RDF格式的数据。
- Direct Mapping of Relational Data to RDF
- The Direct Mapping is an automatic mapping of a relational database to RDF.
- R2RML: RDB to RDF Mapping Language
- R2RML is a customizable language to map a relational database to RDF.
RDB2RDF 工具:
- Ontop
- Ontop is a Virtual Knowledge Graph system. It exposes the content of arbitrary relational databases as knowledge graphs. These graphs are virtual, which means that data remains in the data sources instead of being moved to another database.
- Ontop translates SPARQL queries expressed over the knowledge graphs into SQL queries executed by the relational data sources. It relies on R2RML mappings and can take advantage of lightweight ontologies.
- SparqlMap
- A SPARQL to SQL rewriter based on R2RML specification.
- D2RQ
- D2RQ 提供了自己的 mapping language(D2RQ Mapping Language),其形式和 R2RML 类似。
- Triplify
- Triplify is a small PHP plugin for Web applications, which reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data.
8. D2RQ
The D2RQ Platform is a system for accessing relational databases as virtual, read-only RDF graphs. It offers RDF-based access to the content of relational databases without having to replicate it into an RDF store.
The D2RQ Platform consists of:
- The D2RQ Mapping Language is a declarative language for describing the relation between a relational database schema and RDFS vocabularies or OWL ontologies.
- A D2RQ mapping is itself an RDF document written in Turtle syntax.
- the D2RQ Engine, a plug-in for the Jena Semantic Web toolkit, which uses the mappings to rewrite Jena API calls to SQL queries against the database and passes query results up to the higher layers of the frameworks.
- D2R Server, an HTTP server that provides a Linked Data view, a HTML view for debugging and a SPARQL Protocol endpoint over the database.
9. 知识图谱存储方案
- 基于关系型数据库的存储方案
- 三元组表
- 属性表
- 水平表
- 垂直划分
- 六重索引
- 面向RDF的三元组库
- 原生图数据库
10. Protege
The Protege Project offers WebProtege and Protege Desktop, which are free and open source ontology editing applications.
Protégé Desktop is a feature rich ontology editing environment with full support for the OWL 2 Web Ontology Language, and direct in-memory connections to description logic reasoners like HermiT and Pellet.
参考:
《知识图谱 方法、实践与应用》 An Introduction to RDF and the Jena RDF API: http://jena.apache.org/tutorials/rdf_api.html RDF Schema 1.1: https://www.w3.org/TR/rdf-schema/ OWL Web Ontology Language: https://www.w3.org/TR/2004/REC-owl-features-20040210/ RDF and SPARQL: Using Semantic Web Technology to Integrate the World's Data: https://www.w3.org/2007/03/VLDB/ Semantic University: https://www.cambridgesemantics.com/blog/semantic-university/learn-rdf/ https://www.cambridgesemantics.com/blog/semantic-university/learn-owl-rdfs/ RDB2RDF: Relational Database to RDF: http://www.rdb2rdf.org/ A Direct Mapping of Relational Data to RDF: https://www.w3.org/TR/rdb-direct-mapping/ R2RML: RDB to RDF Mapping Language: https://www.w3.org/TR/r2rml/ SparqlMap: https://github.com/tomatophantastico/sparqlmap Ontop: https://github.com/ontop/ontop D2RQ: http://d2rq.org/ DB-Engines Ranking: https://db-engines.com/en/ranking Protege: https://protege.stanford.edu/