Questions tagged [r-faq]

The r-faq tag is created to group a limited number of questions discussing problems that come up regularly on the R tag. It is not the official FAQ on R for SO, but should serve as an interesting source of information on common problems.

Filter by
Sorted by
Tagged with
2468votes
23answers
388kviews

How to make a great R reproducible example

When discussing performance with colleagues, teaching, sending a bug report or searching for guidance on mailing lists and here on Stack Overflow, a reproducible example is often asked and always ...
1442votes
13answers
1.6mviews

How to join (merge) data frames (inner, outer, left, right)

Given two data frames: df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3))) df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1))) ...
user avatar
334votes
12answers
324kviews

How to reshape data from long to wide format

I'm having trouble rearranging the following data frame: set.seed(45) dat1 <- data.frame( name = rep(c("firstName", "secondName"), each=4), numbers = rep(1:4, 2), value = rnorm(8) )...
user avatar
  • 5,369
215votes
8answers
208kviews

Reshaping data.frame from wide to long format

I have some trouble to convert my data.frame from a wide table to a long table. At the moment it looks like this: Code Country 1950 1951 1952 1953 1954 AFG Afghanistan 20,249 ...
user avatar
  • 10.8k
289votes
6answers
58kviews

Why are these numbers not equal?

The following code is obviously wrong. What's the problem? i <- 0.1 i <- i + 0.05 i ## [1] 0.15 if(i==0.15) cat("i equals 0.15") else cat("i does not equal 0.15") ## i does not equal 0.15
user avatar
  • 5,043
456votes
15answers
826kviews

How to sum a variable by group

I have a data frame with two columns. First column contains categories such as "First", "Second", "Third", and the second column has numbers that represent the number of times I saw the specific ...
user avatar
  • 9,516
236votes
10answers
312kviews

How do I make a list of data frames?

How do I make a list of data frames and how do I access each of those data frames from the list? For example, how can I put these data frames in a list ? d1 <- data.frame(y1 = c(1, 2, 3), ...
user avatar
  • 17.3k
177votes
8answers
212kviews

Aggregate / summarize multiple variables per group (e.g. sum, mean)

From a data frame, is there a easy way to aggregate (sum, mean, max et c) multiple variables simultaneously? Below are some sample data: library(lubridate) days = 365*2 date = seq(as.Date("2000-...
user avatar
  • 7,236
137votes
6answers
44kviews

Split comma-separated strings in a column into separate rows

I have a data frame, like so: data.frame(director = c("Aaron Blaise,Bob Walker", "Akira Kurosawa", "Alan J. Pakula", "Alan Parker", "Alejandro Amenabar", "Alejandro Gonzalez ...
user avatar
  • 8,691
1115votes
10answers
425kviews

Grouping functions (tapply, by, aggregate) and the *apply family

Whenever I want to do something "map"py in R, I usually try to use a function in the apply family. However, I've never quite understood the differences between them -- how {sapply, lapply, etc.} ...
user avatar
  • 28.9k
95votes
5answers
45kviews

Collapse / concatenate / aggregate a column to a single comma separated string within each group

I want to aggregate one column in a data frame according to two grouping variables, and separate the individual values by a comma. Here is some data: data <- data.frame(A = c(rep(111, 3), rep(222,...
user avatar
  • 1,287
668votes
11answers
1.1mviews

How to convert a factor to integer\numeric without loss of information?

When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers. f <- factor(sample(runif(5), 20, replace = TRUE)) ## [1] 0.0248644019011408 0....
user avatar
  • 9,381
159votes
10answers
150kviews

Dynamically select data frame columns using $ and a character value

I have a vector of different column names and I want to be able to loop over each of them to extract that column from a data.frame. For example, consider the data set mtcars and some variable names ...
user avatar
  • 1,915
347votes
14answers
445kviews

Order Bars in ggplot2 bar graph

I am trying to make a bar graph where the largest bar would be nearest to the y axis and the shortest bar would be furthest. So this is kind of like the Table I have Name Position 1 James ...
user avatar
  • 8,285
52votes
8answers
21kviews

Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

I have a dataframe in a wide format, with repeated measurements taken within different date ranges. In my example there are three different periods, all with their corresponding values. E.g. the first ...
user avatar
  • 6,541
26votes
8answers
7kviews

Transpose / reshape dataframe without "timevar" from long to wide format

I have a data frame that follows the below long Pattern: Name MedName Name1 atenolol 25mg Name1 aspirin 81mg Name1 sildenafil 100mg Name2 atenolol 50mg Name2 ...
user avatar
  • 275
1435votes
19answers
1.3mviews

Sort (order) data frame rows by multiple columns

I want to sort a data frame by multiple columns. For example, with the data frame below I would like to sort by column 'z' (descending) then by column 'b' (ascending): dd <- data.frame(b = factor(c(...
user avatar
288votes
18answers
437kviews

ggplot with 2 y axes on each side and different scales

I need to plot a bar chart showing counts and a line chart showing rate all in one chart, I can do both of them separately, but when I put them together, I scale of the first layer (i.e. the geom_bar) ...
user avatar
  • 22k
216votes
9answers
138kviews

Numbering rows within groups in a data frame

Working with a data frame similar to this: set.seed(100) df <- data.frame(cat = c(rep("aaa", 5), rep("bbb", 5), rep("ccc", 5)), val = runif(15)) df <- df[order(df$cat, df$val), ...
user avatar
  • 9,726
201votes
10answers
510kviews

Error: could not find function ... in R

This is meant to be a FAQ question, so please be as complete as possible. The answer is a community answer, so feel free to edit if you think something is missing. This question was discussed and ...
615votes
12answers
287kviews

The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe

R provides two different methods for accessing the elements of a list or data.frame: [] and [[]]. What is the difference between the two, and when should I use one over the other?
user avatar
  • 16.6k
648votes
18answers
839kviews

How should I deal with "package 'xxx' is not available (for R version x.y.z)" warning?

I tried to install a package, using install.packages("foobarbaz") but received the warning Warning message: package 'foobarbaz' is not available (for R version x.y.z) Why doesn't R think that the ...
user avatar
309votes
9answers
237kviews

Simultaneously merge multiple data.frames in a list

I have a list of many data.frames that I want to merge. The issue here is that each data.frame differs in terms of the number of rows and columns, but they all share the key variables (which I've ...
user avatar
  • 4,421
180votes
19answers
118kviews

Replacing NAs with latest non-NA value

In a data.frame (or data.table), I would like to "fill forward" NAs with the closest previous non-NA value. A simple example, using vectors (instead of a data.frame) is the following: > y ...
user avatar
  • 5,187
302votes
16answers
514kviews

Split data frame string column into multiple columns

I'd like to take data of the form before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2')) attr type 1 1 foo_and_bar 2 30 foo_and_bar_2 3 4 foo_and_bar 4 ...
user avatar
  • 3,642
639votes
11answers
367kviews

How can I view the source code for a function?

I want to look at the source code for a function to see how it works. I know I can print a function by typing its name at the prompt: > t function (x) UseMethod("t") <bytecode: 0x2332948> &...
user avatar
86votes
8answers
185kviews

Calculate the mean by group

I have a large data frame that looks similar to this: df <- data.frame(dive = factor(sample(c("dive1","dive2"), 10, replace=TRUE)), speed = runif(10) ...
user avatar
  • 4,333
133votes
17answers
152kviews

Select the row with the maximum value in each group

In a dataset with multiple observations for each subject. For each subject I want to select the row which have the maximum value of 'pt'. For example, with a following dataset: ID <- c(1,1,1,2,2,...
user avatar
  • 1,655
276votes
15answers
448kviews

How to import multiple .csv files at once?

Suppose we have a folder containing multiple data.csv files, each containing the same number of variables but each from different times. Is there a way in R to import them all simultaneously rather ...
user avatar
  • 4,333
89votes
2answers
148kviews

How do I deal with special characters like \^$.?*|+()[{ in my regex?

I want to match a regular expression special character, \^$.?*|+()[{. I tried: x <- "a[b" grepl("[", x) ## Error: invalid regular expression '[', reason 'Missing ']'' (Equivalently stringr::...
user avatar
178votes
9answers
356kviews

Filter data.frame rows by a logical condition

I want to filter rows from a data.frame based on a logical condition. Let's suppose that I have data frame like expr_value cell_type 1 5.345618 bj fibroblast 2 5.195871 bj fibroblast 3 ...
user avatar
  • 5,519
105votes
7answers
174kviews

Converting year and month ("yyyy-mm" format) to a date?

I have a dataset that looks like this: Month count 2009-01 12 2009-02 310 2009-03 2379 2009-04 234 2009-05 14 2009-08 1 2009-09 34 2009-10 2386 I want to plot the data (months as x values ...
user avatar
  • 9,832
142votes
17answers
322kviews

Count number of rows within each group

I have a dataframe and I would like to count the number of rows within each group. I reguarly use the aggregate function to sum data as follows: df2 <- aggregate(x ~ Year + Month, data = df1, sum) ...
user avatar
  • 7,236
433votes
8answers
343kviews

How to add leading zeros?

I have a set of data which looks something like this: anim <- c(25499,25500,25501,25502,25503,25504) sex <- c(1,2,2,1,2,1) wt <- c(0.8,1.2,1.0,2.0,1.8,1.4) data <- data.frame(anim,sex,...
user avatar
  • 6,329
549votes
12answers
255kviews

Quickly reading very large tables as dataframes

I have very large tables (30 million rows) that I would like to load as a dataframes in R. read.table() has a lot of convenient features, but it seems like there is a lot of logic in the ...
user avatar
  • 5,815
341votes
5answers
562kviews

Plotting two variables as lines using ggplot2 on the same graph

A very newbish question, but say I have data like this: test_data <- data.frame( var0 = 100 + c(0, cumsum(runif(49, -20, 20))), var1 = 150 + c(0, cumsum(runif(49, -10, 10))), date = ...
user avatar
  • 53.6k
40votes
11answers
25kviews

How to create a consecutive group number

I have a data frame (all_data) in which I have a list of sites (1... to n) and their scores e.g. site score 1 10 1 11 1 12 4 10 4 11 4 11 8 ...
user avatar
  • 2,457
166votes
3answers
345kviews

Add legend to ggplot2 line plot

I have a question about legends in ggplot2. I managed to plot three lines in the same graph and want to add a legend with the three colors used. This is the code used library(ggplot2) require(...
user avatar
  • 4,661
994votes
18answers
2.1mviews

Remove rows with all or some NAs (missing values) in data.frame

I'd like to remove the lines in this data frame that: a) contain NAs across all columns. Below is my example data frame. gene hsap mmul mmus rnor cfam 1 ENSG00000208234 0 NA NA ...
user avatar
  • 11.1k
463votes
35answers
341kviews

How to find the statistical mode?

In R, mean() and median() are standard functions which do what you'd expect. mode() tells you the internal storage mode of the object, not the value that occurs the most in its argument. But is there ...
user avatar
  • 20k
38votes
4answers
19kviews

Calculate group mean, sum, or other summary stats. and assign column to original data

I want to calculate mean (or any other summary statistics of length one, e.g. min, max, length, sum) of a numeric variable ("value") within each level of a grouping variable ("group"). The summary ...
user avatar
  • 651
581votes
16answers
433kviews

Drop unused factor levels in a subsetted data frame

I have a data frame containing a factor. When I create a subset of this dataframe using subset or another indexing function, a new data frame is created. However, the factor variable retains all of ...
user avatar
  • 25.7k
236votes
10answers
169kviews

Use dynamic name for new column/variable in `dplyr`

I want to use dplyr::mutate() to create multiple new columns in a data frame. The column names and their contents should be dynamically generated. Example data from iris: library(dplyr) iris <- ...
user avatar
  • 4,421
63votes
2answers
103kviews

Split data.frame based on levels of a factor into new data.frames

I'm trying to create separate data.frame objects based on levels of a factor. So if I have: df <- data.frame( x=rnorm(25), y=rnorm(25), g=rep(factor(LETTERS[1:5]), 5) ) How can I split df ...
user avatar
  • 4,784
149votes
7answers
265kviews

Order discrete x scale by frequency/value

I am making a dodged bar chart using ggplot with discrete x scale, the x axis are now arranged in alphabetical order, but I need to rearrange it so that it is ordered by the value of the y-axis (i.e., ...
user avatar
  • 22k
44votes
7answers
33kviews

Find complement of a data frame (anti - join)

I have two data frames(df and df1). df1 is subset of df. I want to get a data frame which is complement of df1 in df, i.e. return rows of the first data set which are not matched in the second. For ...
user avatar
  • 1,726
179votes
9answers
561kviews

R memory management / cannot allocate vector of size n Mb

I am running into issues trying to use large objects in R. For example: > memory.limit(4000) > a = matrix(NA, 1500000, 60) > a = matrix(NA, 2500000, 60) > a = matrix(NA, 3500000, 60) ...
user avatar
  • 10.9k
277votes
9answers
365kviews

Add regression line equation and R^2 on graph

I wonder how to add regression line equation and R^2 on the ggplot. My code is: library(ggplot2) df <- data.frame(x = c(1:100)) df$y <- 2 + 3 * df$x + rnorm(100, sd = 40) p <- ggplot(data = ...
user avatar
33votes
4answers
25kviews

Subset data frame based on number of rows per group

I have data like this, where some "name" occurs more than three times: df <- data.frame(name = c("a", "a", "a", "b", "b", "c", "c", "c", "c"), x = 1:9) name x 1 a 1 2 a 2 3 a 3 4 b ...
user avatar
  • 535
89votes
3answers
264kviews

Select rows from a data frame based on values in a vector

I have data similar to this: dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L,...
user avatar
  • 2,735

15 30 50 per page
1
2 3 4 5