Let's say you have a list of users in one data frame and a list of their purchases in a second data frame. You'd like to combine these data frames into one based on the user id. In this article, we will learn how to use joins in R to combine data frames by column.
Merge Two Columns In R
The basic way to merge two data frames is to use the
merge function. We supply the two data frames and the column that we want to merge on.
We can merge two data frames in R by using the merge function or by using family of join function in dplyr package. The data frames must have same column names on which the merging happens. Merge Function in R is similar to database join operation in SQL. Often you may be interested in joining multiple data frames in R. Fortunately this is easy to do using the leftjoin function from the dplyr package. To merge both the data frames we will run the below R script. Here I have used the natural join to know more about other types of R data frame joins you can follow the above article link. We use comma (,) to pass both column name as key for merging the data frames.
By default the data frames are merged on the columns with names they both have, but separate specifications of the columns can be given by by.xand by.y. The rows in the two data frames that match on the specified columns are extracted, and joined together.
This type of join is known as an 'inner join' and will include only items that match. For example, if one of the users didn't have a purchase, it would not be shown. Let's take a look at the same merge, but remove the purchase from Larry.
If we want to show Larry, even though they don't have a purchase, we can use a
left join which will join and keep everything in our left table (which is users in this case). To do this, we use the
all.x= TRUE property on the merge function.
In a similar manner, if we have a product, but we don't have the user data, maybe the user was deleted, we can use a right join to display the product.
If we would like to do both the left and right, we can use the
outer join. This will show both tables even if there are missing values.
Append Two Data Frame R
These joins will help you with most of your day to day tasks. There are a few other joins to look into. Also, many libraries have added functions that are a bit eaier to use. We will learn these later on.