R Programming-Week 1-Data Types

2019-04-10 17:10:19 浏览数 (1)

Objects

R has five basic or “atomic” classes ofobjects:

character

numeric (real numbers)

integer

complex

logical (True/False)

The most basic object is a vector

A vector can only contain objects of thesame class

BUT: The one exception is a list, which isrepresented as a vector but can contain objects of

different classes (indeed, that’s usuallywhy we use them)

Empty vectors can be created with thevector() function.

Numbers

Numbers in R a generally treated as numericobjects (i.e. double precision real numbers)

If you explicitly want an integer, you needto specify the L suffix

Ex: Entering 1 gives you a numeric object;entering 1L explicitly gives you an integer.

There is also a special number Inf whichrepresents infinity; e.g. 1 / 0; Inf can be used in

ordinary calculations; e.g. 1 / Inf is 0

The value NaN represents an undefined value(“not a number”); e.g. 0 / 0; NaN can also be

thought of as a missing value (more on thatlater)

Attributes

R objects can have attributes

names, dimnames

dimensions (e.g. matrices, arrays)

class

length

other user-defined attributes/metadata

Attributes of an object can be accessedusing the attributes() function.

Creating Vectors

The c() function can be used to createvectors of objects.

Using the vector() function

> x <- vector("numeric",length = 10)

> x

[1]0 0 0 0 0 0 0 0 0 0

Mixing Objects Mixing Objects

> y <- c(1.7, "a") ##character

> y <- c(TRUE, 2) ## numeric

> y <- c("a", TRUE) ##character

When different objects are mixed in avector, coercion occurs so that every element in the vector is

of the same class.

Explicit Coercion

Objects can be explicitly coerced from oneclass to another using the as.* functions, if available.

> x <- 0:6

> class(x)

[1] "integer"

> as.numeric(x)

[1] 0 1 2 3 4 5 6

> as.logical(x)

[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE

> as.character(x)

[1] "0" "1""2" "3" "4" "5" "6"

Nonsensical coercion results in NAs.

> x <- c("a","b", "c")

> as.numeric(x)

[1] NA NA NA

Warning message:

NAs introduced by coercion

> as.logical(x)

[1] NA NA NA

> as.complex(x)

[1] 0 0i 1 0i 2 0i 3 0i 4 0i 5 0i 6 0i

Lists

Lists are a special type of vector that cancontain elements of different classes. Lists are a very

important data type in R and you should getto know them well.

> x <- list(1, "a", TRUE, 1 4i)

> x

[[1]]

[1] 1

[[2]]

[1] "a"

[[3]]

[1] TRUE

[[4]]

[1] 1 4i

Matrices

Matrices are vectors with a dimensionattribute. The dimension attribute is itself an integer vector of length 2(nrow, ncol)

> m <- matrix(nrow = 2, ncol = 3)

> m

[,1][,2] [,3]

[1,] NA NA NA

[2,] NA NA NA

> dim(m)

[1] 2 3

> attributes(m)

$dim

[1] 2 3

Matrices (cont’d)

Matrices are constructed column-wise, soentries can be thought of starting in the “upper left” corner and running downthe columns.

> m <- matrix(1:6, nrow = 2, ncol =3)

> m

[,1][,2] [,3]

[1,] 1 3 5

[2,] 2 4 6

Matrices can also be created directly fromvectors by adding a dimension attribute.

> m <- 1:10

> m

[1] 1 2 3 4 5 6 7 8 9 10

> dim(m) <- c(2, 5)

> m

[,1][,2] [,3] [,4] [,5]

[1,] 1 3 5 7 9

[2,] 2 4 6 8 10

cbind-ing and rbind-ing cbind-ing andrbind-ing

Matrices can be created by column-bindingor row-binding with cbind() and rbind().

> x <- 1:3

> y <- 10:12

> cbind(x, y)

x y

[1,] 1 10

[2,] 2 11

[3,] 3 12

> rbind(x, y)

[,1][,2] [,3]

x 1 2 3

y 10 11 12

Factors

Factors are used to represent categoricaldata. Factors can be unordered or ordered. One can think

of a factor as an integer vector where eachinteger has a label.

Factors are treated specially by modellingfunctions like lm() and glm()

Using factors with labels is better thanusing integers because factors are self-describing; having

a variable that has values “Male” and“Female” is better than a variable that has values 1 and 2.

> x <- factor(c("yes","yes", "no", "yes", "no"))

> x

[1] yes yes no yes no

Levels: no yes

> table(x)

x

no yes

2 3

> unclass(x)

[1] 2 2 1 2 1

attr(,"levels")

[1] "no" "yes"

The order of the levels can be set usingthe levels argument to factor(). This can be important

in linear modelling because the first levelis used as the baseline level.

> x <- factor(c("yes","yes", "no", "yes", "no"),

levels = c("yes", "no"))

> x

[1] yes yes no yes no

Levels: yes no

Missing Values

Missing values are denoted by NA or NaN for undefined mathematical operations.

is.na() is used to test objects if they areNA

is.nan() is used to test for NaN

NA values have a class also, so there areinteger NA, character NA, etc.

A NaNvalue is also NA but the converse is not true

> x <- c(1, 2, NA, 10, 3)

> is.na(x)

[1] FALSE FALSE TRUE FALSE FALSE

> is.nan(x)

[1] FALSE FALSE FALSE FALSE FALSE

> x <- c(1, 2, NaN,NA, 4)

> is.na(x)

[1] FALSE FALSE TRUE TRUE FALSE

> is.nan(x)

[1] FALSE FALSE TRUE FALSE FALSE

Data Frames

Data frames are used to store tabular data

They are represented as a special type oflist where every element of the list has to have the

same length

Each element of the list can be thought ofas a column and the length of each element of the list

is the number of rows

Unlike matrices, data frames can storedifferent classes of objects in each column (just like lists);

matrices must have every element be thesame class

Data frames also have a special attributecalled row.names

Data frames are usually created by callingread.table() or read.csv()

Can be converted to a matrix by callingdata.matrix()

> x <- data.frame(foo = 1:4, bar =c(T, T, F, F))

> x

foobar

1 1 TRUE

2 2 TRUE

3 3 FALSE

4 4 FALSE

> nrow(x)

[1] 4

> ncol(x)

[1] 2

Names

R objects can also have names, which isvery useful for writing readable code and self-describing

objects.

> x <- 1:3

> names(x)

NULL

> names(x) <- c("foo","bar", "norf")

> x

foo bar norf

1 23

> names(x)

[1] "foo" "bar""norf"

Summary

Data Types

atomic classes: numeric, logical,character, integer, complex

vectors, lists

factors

missing values

data frames

names

0 人点赞