In this article we will examine yet another of the GNU Core Utilities that make up the foundation of the Linux operating system. The csplit command is a small, yet powerful text utility that allows you to split a file into two or more parts using context lines.

The csplit command should not be confused with the split command. Although both split large files into smaller pieces, the csplit command can work with context lines, and the split command splits on file sizes. With csplit you can split a file after x number of lines or even use regular expressions to split on a pettern match. Let’s look at some examples.

We will use the following text file as an example (one of my favorite songs as a child):

Pepino, oh, you little mouse, oh, won’t you go away
Find yourself another house to run around and play
You scare my girl, you eat my cheese, you even drink my wine
I try so hard to catch you but you trick me all the time

Splitting a File After x Number of Lines

As we can see, our example text has four lines. We can split the file at the second line by using two as a command line argument.

[[email protected] ~]$ csplit pepino.txt 2

This will output two numbers showing the sizes of the newly created chunks/files. In this case they are 52 bytes and 169 bytes respectively. You can suppress the file size output using the -s option if desired.

[[email protected] ~]$ csplit pepino.txt 2
52
169
[[email protected] ~]$ ls -l
total 12
-rw-rw-r--. 1 mcherisi mcherisi 221 Sep 26 23:22 pepino.txt
-rw-rw-r--. 1 mcherisi mcherisi  52 Sep 26 23:23 xx00
-rw-rw-r--. 1 mcherisi mcherisi 169 Sep 26 23:23 xx01

The csplit command created two new files named xx00 and xx01. These files contain the source text which has been split at line 2 per our request. So “xx00” holds the first line, and “xx01” holds the rest of the text.

NOTE: Later in this article we will discuss formatting the name of the output files.

[[email protected] ~]$ cat xx00 
Pepino, oh, you little mouse, oh, won't you go away
[[email protected] ~]$ cat xx01
Find yourself another house to run around and play
You scare my girl, you eat my cheese, you even drink my wine
I try so hard to catch you but you trick me all the time

Splitting a File After Matched String

Instead of splitting the file by line number, you can also split after a matched string. For example, we can split the file after matching the word “scare”. We must wrap the string we are trying to match in '/' slashes.

[[email protected] ~]$ csplit pepino.txt /scare/
103
118
[[email protected] ~]$ cat xx00 
Pepino, oh, you little mouse, oh, won't you go away
Find yourself another house to run around and play
[[email protected] ~]$ cat xx01
You scare my girl, you eat my cheese, you even drink my wine
I try so hard to catch you but you trick me all the time

As you can see above, csplit read the file until it found the matched pattern (scare) then split the file. You can use regular expressions here as well.

[[email protected] ~]$ csplit pepino.txt /^[Ff]/
52
169

NOTE: Pattern matching with a / delimiter will copy up to, but not including the matched line. This will cause the second line to be in the second file chunk.

If you use the % delimiter, csplit will skip to the matched pattern and ignore the lines before the match.

[[email protected] ~]$ csplit pepino.txt %cheese%
118
[[email protected] ~]$ cat xx00 
You scare my girl, you eat my cheese, you even drink my wine
I try so hard to catch you but you trick me all the time

Since we used the percent sign as the delimiter, we only created a single file with the lines AFTER the match. The lines before the match were ignored. We “SKIPPED” to the match.

Repeat Pattern x Number of Times

By default csplit will split the file at the first match only. You can easily instruct it to match the pattern more than once. Here we instruct it to find the pattern three times.

It is important to understand that the number you put here is the number of times it will repeat the pattern. Here we are using the number 2, which means it will stop after it finds the pattern 3 times. The first initial time, and the 2 we told it to repeat.

[[email protected] ~]$ csplit pepino.txt /[Rr]/ {2}
52
51
61
57

Since all but the first line have the letter “r” in them, this creates four files. The second line was the initial match, then each line after it was the 2 repeating matches.

NOTE: You can use an asterisks {*} to match the pattern as many times as possible.

Splitting the File with a Line Offset

Using line offsets allow you to split x number of lines after or before a match. An offset must have a + or - followed by a positive integer. Here we are telling csplit to use an offset of 2 additional lines.

[[email protected] ~]$ csplit pepino.txt /Pepino/+2
103
118
[[email protected] ~]$ cat xx00 
Pepino, oh, you little mouse, oh, won't you go away
Find yourself another house to run around and play

Remember pattern matching copies up to, but not including the matched line. This is why using an offset of +2 includes the action matched line and one additional line.

Formatting the csplit Output File Names

There are two parts to the output file names, the prefix (xx) and the suffix (00). Both of these can be easily modified. Let’s take a look at how to modify the prefix first.

Here we will use the -f option followed by our desired prefix (the word putorius followed by a dash).

[[email protected] ~]$ csplit pepino.txt /cheese/ -f putorius-
103
118
[[email protected] ~]$ ls -l
total 12
-rw-rw-r--. 1 mcherisi mcherisi 221 Sep 26 23:22 pepino.txt
-rw-rw-r--. 1 mcherisi mcherisi 103 Sep 27 00:17 putorius-00
-rw-rw-r--. 1 mcherisi mcherisi 118 Sep 27 00:17 putorius-01

As for the suffix, you can use the -n option to set the number of digits used. By default two digits are used (xx00), here we will set it to four.

[[email protected] ~]$ csplit pepino.txt /cheese/ -f putorius- -n 4
103
118
[[email protected] ~]$ ls -l
total 12
-rw-rw-r--. 1 mcherisi mcherisi 221 Sep 26 23:22 pepino.txt
-rw-rw-r--. 1 mcherisi mcherisi 103 Sep 27 00:24 putorius-0000
-rw-rw-r--. 1 mcherisi mcherisi 118 Sep 27 00:24 putorius-0001

You can also use sprintf formatting with the -b option, although there are limited format specifiers.

[[email protected] ~]$ csplit pepino.txt /cheese/ -f putorius- -b "%d"
103
118
[[email protected] ~]$ ls -l
total 12
-rw-rw-r--. 1 mcherisi mcherisi 221 Sep 26 23:22 pepino.txt
-rw-rw-r--. 1 mcherisi mcherisi 103 Sep 27 00:26 putorius-0
-rw-rw-r--. 1 mcherisi mcherisi 118 Sep 27 00:26 putorius-1

Conclusion

Just like most of the text utilities included in the coreutils package, csplit is powerful and does it’s one job very well. In this article we covered how all the basics of splitting a file on the Linux command line with csplit. If you have any questions or comments we would love to heard from you below.

Resources and Links