R的數據圖形

本文轉載自查看原文 2017-08-24 23:06 1401 R語言/ R語言學習筆記/ 圖形

R支持4種圖形類型: base graphics, grid graphics, lattice graphics, ggplot2。

Base graphics是R的默認圖形系統。

一、 基本圖形函數plot（）

plot（）命令中的type參數用於明確圖形如何繪制，具體type值使用如下：

"p" for "points"
"l" for "lines"
"o" for "overlaid" (例如，和點重疊的線)
"s" for "steps"

type=“n”這個特殊選項，可用於在坐標軸上繪制來自多個源的數據。

例如：

plot(x,y,xlab="",ylab="",pch=2,col="red")

pch：數據點形狀

col：數據點顏色

二、其他類型的圖形函數

1、餅圖：pie（）

2、直方圖是表示數字變量分布范圍的最常用方式

hist()：base R，記錄每個區域出現的次數的直方圖

truehist() ：MASS package，規整數值給出概率密度的估計。

密度圖可看做平滑直方圖，例如line（density（））

直方圖和密度圖的一個局限是，難以觀察數據是否符合高斯分布（正態分布）

使用qqplot（）觀察數據是否符合高斯分布（正態分布）

3、sunflowerplot() 函數

散點圖中的每個點對應一個(x, y)對，如果同一(x, y)對出現多次，點會重疊，在散點圖中無法觀察到。這個問題有很多解決方法，例如 jittering（擾動）, 對每個x、y添加小的隨機值，因此重復點將作為附近點簇集出現。另一個有效方法是 sunflowerplot（）函數，, 每個重復值由太陽花展示，每個花瓣代表某個數據點的一次重復

4、boxplot()函數

boxplot()函數表示數字變量y對應變量x的每個唯一值的分布情況。x變量不應有太多唯一值，多於10個會使得圖形難以觀察。

可選參數：

varwidth 允許箱型圖寬度隨變量變化來顯示不同數據子集的大小。

log 允許y值的對數變換

las 允許更多可讀的軸標簽

# 創建一個y軸取對數和水平標簽的變量寬度箱型圖

boxplot(y ~ x data = Boston, varwidth = TRUE, log = "y", las = 1)

5、馬賽克圖mosaicplot()

馬賽克圖可看做是分類變量間的散點圖，也可以用於觀察數字型變量的關系。

6、bagplot()

一個簡單的箱型圖基於五個數字給出了一個數字變量的變動范圍：

最大值、最小值、中間值、上、下四分位數。

標准箱型圖通過以上數字中的三個計算名義上的數據范圍，將超出該范圍的點標示為極端值，用獨立的點表示。包型圖表示兩個數字變量的關系，二維的包對應標准箱型圖中的箱，並標示出極端值。

7、corrplot（）函數圖示相關性矩陣

相關性矩陣是獲取多個數字變量間關系的初步看法的有效工具。

在圖中，瘦長的橢圓表示指定的變量間存在較大相關性，近乎圓形表示相關性近似為0.

# Load the corrplot library for the corrplot() function
library(corrplot)

# Compute the correlation matrix for these variables
corrMat <- cor(data)

# Generate the correlation ellipse plot
corrplot(corrMat,method="ellipse")

8、構造和繪制rpart() 模型

決策樹容易觀察和解釋，是預測模型的一種常用方式。

# Load the rpart library

library(rpart)

# Fit an rpart model to predict medv from all other Boston variables
tree_model <- rpart(medv~.,data=Boston)

# Plot the structure of this decision tree model
plot(tree_model)

# Add labels to this plot
text(tree_model,cex=0.7)

9、使用symbol（）函數來顯示多於兩個變量之間的關系。

散點圖顯示一個數字變量是如何隨第二個數字變量改變。symbols（）允許擴展散點圖來顯示其他變量的影響。circles參數用來創建一個氣泡圖，每個數據點由一個圓圈表示，半徑基於第三個變量值。

# Call symbols() to create the default bubbleplot
symbols(Cars93$Horsepower, Cars93$MPG.city,
circles = Cars93$Cylinders)

# Repeat, with the inches argument specified
symbols(Cars93$Horsepower, Cars93$MPG.city,
circles = Cars93$Cylinders,
inches = 0.2)

10、點陣圖示例

# Load the lattice package
library(lattice)

# Use xyplot() to construct the conditional scatterplot
xyplot(calories ~ sugars | shelf, data = UScereal)

三、環境函數par()

par（）函數用於設置圖形參數，且參數一直保持有效直到被下一個par（）命令重置。

空參數的par()命令返回當前所有圖形參數值。

例：創建一個一排2列的圖形陣列

par(mfrow = c(1, 2))

四、為圖形添加細節

1、line（）在已存在的圖中添加線條

# Create the numerical vector x
x <- seq(0, 10, length = 200)

# Compute the Gaussian density for x with mean 2 and standard deviation 0.2
gauss1 <- dnorm(x, mean = 2, sd = 0.2)

# Compute the Gaussian density with mean 4 and standard deviation 0.5
gauss2 <- dnorm(x, mean = 4, sd = 0.5)

# Plot the first Gaussian density
plot(x, gauss1, type = "l", ylab = "Gaussian probability density")

# Add lines for the second Gaussian density
lines(x, gauss2, lty = 2, lwd = 3)

2、 points()

在plot() 或 points()中，pch參數可基於數據中的變量來設置。

# Create an empty plot using type = "n"
plot(mtcars$hp, mtcars$mpg, type = "n",
xlab = "Horsepower", ylab = "Gas mileage")

# Add points with shapes determined by cylinder number
points(mtcars$hp, mtcars$mpg, pch = mtcars$cyl)

# Create a second empty plot
plot(mtcars$hp, mtcars$mpg, type = "n",
xlab = "Horsepower", ylab = "Gas mileage")

# Add points with shapes as cylinder characters
points(mtcars$hp, mtcars$mpg,
pch = as.character(mtcars$cyl))

3、為線性回歸模型添加趨勢線

abline（）在已存在圖形中添加直線。這條線由截距參數a和斜率參數b來規定。

例如 abline(a = 0, b = 1) 添加了一條截距為0的等距參考線。

還可通過線性回歸模型來規定參數

# Build a linear regression model for the whiteside data
linear_model <- lm(Gas ~ Temp, data = whiteside)

# Create a Gas vs. Temp scatterplot from the whiteside data
plot(whiteside$Temp, whiteside$Gas)

# Use abline() to add the linear regression line
abline(linear_model, lty = 2)

4、使用text() 標記圖形特性

參數：

x 規定x變量的值
y 規定y變量的值
labels 規定x-y鍵值對的標簽。

adj 取0-1之間的任意值，小於0，字在x位置的右邊；大於1，字在x位置的左邊

cex 字體大小與默認值的比例

font 字體

srt參數旋轉字體

5、 legend()

為圖形添加解釋文字

legend("topright", pch = c(17, 1), legend = c("Before", "After"))

6、使用 axis() 添加定制軸

當需要使用自己的軸標簽時，可在繪圖函數中設置參數axes = FALSE阻止生成默認軸，再調用axis生成定制軸

axis（）的參數：

side 表示軸位置，1底部，2左邊，3頂部，4右邊

at 在哪些點繪制刻度

labels 每個刻度的標簽

# Create a boxplot of sugars by shelf value, without axes
boxplot(sugars ~ shelf, data = UScereal,
axes = FALSE)

# Add a default y-axis to the left of the boxplot
axis(side = 2)

# Add an x-axis below the plot, labelled 1, 2, and 3
axis(side = 1)

# Add a second x-axis above the plot
axis(side = 3, at = c(1, 2, 3),
labels = c("floor", "middle", "top"))

7、用supsmu()添加平滑趨勢曲線

一些散點圖明顯不是線性趨勢，需要使用曲線來突出數據的行為。參數bass控制趨勢曲線的平滑度，默認值為0，按時較大值（最大10）可生成更平滑的曲線。

# Create a scatterplot of MPG.city vs. Horsepower
plot(Cars93$Horsepower, Cars93$MPG.city)

# Call supsmu() to generate a smooth trend curve, with default bass
trend1 <- supsmu(Cars93$Horsepower, Cars93$MPG.city)

# Add this trend curve to the plot
lines(trend1)

# Call supsmu() for a second trend curve, with bass = 10
trend2 <- supsmu(Cars93$Horsepower, Cars93$MPG.city,
bass = 10)

# Add this trend curve as a heavy, dotted line
lines(trend2, lty = 3, lwd = 2)

五、判斷散點圖數量是否過多

matplot（）在同一坐標軸中生成多個散點圖。散點圖中的點默認由1到n的數字表示，n是包含的散點圖的總數。

# Set up a two-by-two plot array
par(mfrow = c(2, 2))

# Use matplot() to generate an array of two scatterplots
matplot(df$calories, df[, c("protein", "fat")],
xlab = "calories", ylab = "")

# Add a title
title("Two scatterplots")

# Use matplot() to generate an array of three scatterplots
matplot(df$calories, df[, c("protein", "fat", "fibre")],
xlab = "calories", ylab = "")

# Add a title
title("Three scatterplots")

# Use matplot() to generate an array of four scatterplots
matplot(df$calories,
df[, c("protein", "fat", "fibre", "carbo")],
xlab = "calories", ylab = "")

# Add a title
title("Four scatterplots")

# Use matplot() to generate an array of five scatterplots
matplot(df$calories,
df[, c("protein", "fat", "fibre", "carbo", "sugars")],
xlab = "calories", ylab = "")

# Add a title
title("Five scatterplots")

六、判斷文字數量是否過多

wordcloud（）根據出現的頻率來展示不同大小的文字。頻率更高的文字較大，較少出現的文字字體較小。

第一個參數：文字的字符向量

第二個參數：每個文字出現的次數的數字向量

scale：是一個兩元數字向量，表示最大文字和最小文字的相對大小

min.freq 規定文字雲只包含至少出現min.freq次的文字，默認值是3.

# Create the wordcloud of all model names with smaller scaling
wordcloud(words = names(model_table),
freq = as.numeric(model_table),
scale = c(0.75, 0.25),
min.freq = 1)

七、用多種圖形來觀察數據

# Set up a two-by-two plot array
par(mfrow = c(2, 2))

# Plot the raw duration data
plot(geyser$duration, main = "Raw data")

# Plot the normalized histogram of the duration data
truehist(geyser$duration, main = "Histogram")

# Plot the density of the duration data
plot(density(geyser$duration), main = "Density")

# Construct the normal QQ-plot of the duration data
qqPlot(geyser$duration, main = "QQ-plot")

八、構造和展示布局矩陣

1、使用matrix（）生成一個圖形位置的矩陣，然后用layout()建立一個圖形陣列，layout.show()用於驗證圖形陣列的形狀。

# Define row1, row2, row3 for plots 1, 2, and 3
row1 <- c(0, 1)
row2 <- c(2, 0)
row3 <- c(0, 3)

# Use the matrix function to combine these rows into a matrix
layoutMatrix <- matrix(c(row1, row2, row3),
byrow = TRUE, nrow = 3)

# Call the layout() function to set up the plot array
layout(layoutMatrix)

# Show where the three plots will go
layout.show(3)

2、創建圖形陣列

# Set up the plot array
layout(layoutMatrix)

# Construct the vectors indexB and indexA
indexB <- which(whiteside$Insul == "Before")
indexA <- which(whiteside$Insul == "After")

# Create plot 1 and add title
plot(whiteside$Temp[indexB], whiteside$Gas[indexB],
ylim = c(0, 8))
title("Before data only")

# Create plot 2 and add title
plot(whiteside$Temp, whiteside$Gas,
ylim = c(0, 8))
title("Complete dataset")

# Create plot 3 and add title
plot(whiteside$Temp[indexA], whiteside$Gas[indexA],
ylim = c(0, 8))
title("After data only")

3、創建不同大小圖形的陣列

# Create row1, row2, and layoutVector
row1 <- c(1, 0, 0)
row2 <- c(0, 2, 2)
layoutVector <- c(row1, rep(row2, 2))

# Convert layoutVector into layoutMatrix
layoutMatrix <- matrix(layoutVector, byrow = TRUE, nrow = 3)

# Set up the plot array
layout(layoutMatrix)

# Plot scatterplot
plot(Boston$rad, Boston$zn)

# Plot sunflower plot
sunflowerplot(Boston$rad, Boston$zn)

九、圖形函數可返回有用信息

barplot() 函數除了創建圖形, 還可以返回圖中每個條形的中心位置的數字向量。

當我們想在水平條形圖的條形上放置文字時，這個返回值很有用。因此可獲取該返回值並在text（）函數中作為y參數。使我們可以在任意x位置將文字放置在每個水平條的中間。

# Create a table of Cylinders frequencies
tbl <- table(Cars93$Cylinders)

# Generate a horizontal barplot of these frequencies
mids <- barplot(tbl, horiz = TRUE,
col = "transparent",
names.arg = "")

# Add names labels with text()
text(20, mids, names(tbl))

# Add count labels with text()
text(35, mids, as.numeric(tbl))

十、將圖形結果保存為文件

png文件易於分享和作為email附件。使用png（）函數生成和命名一個png文件，建立起一個特殊的環境可獲取所有的圖形輸出直到使用dev.off()指令退出該環境。

# Call png() with the name of the file we want to create
png("bubbleplot.png")

# Re-create the plot from the last exercise
symbols(Cars93$Horsepower, Cars93$MPG.city,
circles = Cars93$Cylinders,
inches = 0.2)

# Save our file and return to our interactive session
dev.off()

# Verify that we have created the file
list.files(pattern = "png")

十一、圖形的顏色

1、12種推薦顏色

IScolors <- c("red", "green", "yellow", "blue","black", "white", "pink", "cyan","gray", "orange", "brown", "purple")

2、使用顏色來增強氣泡圖

# Iliinsky and Steele color name vector
IScolors <- c("red", "green", "yellow", "blue",
"black", "white", "pink", "cyan",
"gray", "orange", "brown", "purple")

# Create the colored bubbleplot
symbols(Cars93$Horsepower, Cars93$MPG.city,
circles = Cars93$Cylinders, inches = 0.2,
bg = IScolors[as.numeric(Cars93$Cylinders)])

3、使用顏色來增強堆積條形圖

barplot函數默認為每個條圖的不同分段使用深淺不同的灰色

# Create a table of Cylinders by Origin
tbl <- table(Cars93$Cylinders, Cars93$Origin)

# Create the default stacked barplot
barplot(tbl)

# Enhance this plot with color
barplot(tbl, col = IScolors)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 R語言實戰（一）介紹、數據集與圖形初階 R語言-基本圖形 R語言--圖形基本使用2 R語言--圖形基本使用3 R語言-圖形初階 R語言--圖形基本使用1 R語言與醫學統計圖形-【30】流行病學數據可視化 R語言：《ggplot2：數據分析與圖形藝術》（第三版）【數據分析 R語言實戰】學習筆記第四章數據的圖形描述 R語言中給圖形添加圖例