序言
In the last 20 years the amount of data created has grown massively. The need to understand this data,communicate what it means and use it to make better decisions has also grown. What has not changed is the human biology, so our brains must make sense of this ever-increasing amount information. As pictures are easier to understand than numbers, good visualisations have become more important as data grows in
quantity, size and complexity.
(在过去的20 年中,随着社会产生数据的大量增加,对数据的理解、解释与决策的需求也随之增加。而固定不变是人类本身,所以我们的大脑必须学会理解这些日益增加的数据信息。所谓一图胜千言,对于数量、规模与复杂性不断增加的数据,优秀的数据可视化也变得愈加重要。)
Data comes in different kinds so it demands different methods to make sense of it. It is not possible to have a single toolprogram that will work for all datasets, so we must be flexible. Many times we have to manipulate data before we can visualise it. In fact, a visualisation is typically part of a wider analysis, so we must learn to write code to analyse and visualise the data. Programming is the means by which we bring out the flexibility.
(数据来源各不同,这也导致我们需要不同的方法去理解它们。想使用一种工具或者编程语言就适用于所有数据,这是天方夜谭。所以,我们必须随机应变。在很多情况下,我们不得不在操作数据前先可视化数据。实际上,数据可视化是数据分析的一个特别部分。所以,我们必须学会编程去分析与可视化数据。编程可以给我们带来各种灵活性的方法。)
Now comes the first choice, in what programming language shall we write the code? We have to choose at least one and the authors of this book have chosen the Python programming language.
(现在面临的第一个选择就是我们将使用什么样的语言编程。我们不得不选择一种编程语言,而这本书选择Python 作为编程语言。)
Python is a widely used general programming language that is easy to learn and it has been embraced by a large scientific computing community who have created an open ecosystem of packages for anlaysing and visualising data. By choosing Python these packages become available to you free of charge. For example, key packages like NumPy and Pandas which are covered in Chapter 2, make it possible to represent data in sequences and in tables, and they provide many useful methods to act on this data.
(Python 是一种广泛使用的编程语言,易于学习,而且一个巨大的科学计算社区开发了一个拥有许多数据分析与可视化包的开源生态圈。如果选择Python 作为编程语言,这些包就可以供你免费使用。比如,本书第2 章讲解的Python 核心包NumPy 和Pandas,可以使用序列和表格表示数据,同时还提供了许多有用的数据操作方法。)
The next choice is, what packages to use for visualisation? The authors have three choices for you;Matplotlib, Seaborn and Plotnine. Are they good choices? Yes, they are.
(接下来的选择就是我们该使用何种包实现数据可视化。本书作者提供了三个选择:Matplotlib、Seaborn 和Plotnine。那它们是不是好的选择?是的,非常正确。)
Matplotlib is the most widely used package for data visualisation in Python. Powerful and versatile, it can be used to create figures for publication or to create interactive environments. In 1999 Leland Wilkinson in the book "The Grammar of Graphics" introduced an elegant way with which to think about data visualisation. This "Grammar" gives us a structured way with which to transform data into to a visualisation and it makes it easy to create many kinds of complicated plots. This is where the Seaborn and plotnine packages come in, they are built on top of matplotlib and are inspired by ggplot2 an implementation of "The Grammar of Graphics" by Hadley Wickham.
(在Python 中使用最为广泛的数据可视化包是matplotlib。它功能强大且齐全,可以用于制作出版物中的图表,也可以用于制作交互式图表。Leland Wilkinson 于1999 年撰写的书籍《图形语法》介绍了一种实现数据可视化的优秀方法。这种语法给了我们一种将数据转换成图表的结构性方法,而且使绘制各种复杂图表变得更加容易。这就是Seaborn 和plotnine 包的由来。它们建立在matplotlib
包的基础上,而且启发于R 语言的ggplot2 包- Hadley Wickham 基于《图形语法》开发的数据可视化包。)
The programming language and key packages are choices made for you, but making beautiful visualisations requires many more choices. These choices change depending on the data, display medium and audience; they are what this book will help you learn to make. In here, you will get exposed to a variety of plots, you will learn about the advantages of different plots for the same data, you will learn about *The Grammar of Graphics*, you will learn how to create visualisations with multiple plots and you will learn how to ustomize the visualisations and ultimately you will learn how to make beautiful visualisations.
(编程语言和相应的核心包已经帮你选择,但是制作优美的图表仍需更多技能。这些技能的选择取决于你的数据、展示媒介与受众,这就是这本书将要帮助你学习的内容。在这里,你会接触到各种各样的图表,会学习到同一数据不同可视化方法的优势,会学习到图形语法,还会学习到如何使用各种图表实现数据可视化,学习到如何定制化图表,最终你会学习到如何制作优美的数据可视化。)
No