Quickly creating large files

I’m surprised how many people never think to do this…. but it makes it quite easy.

If you need a large text file, perhaps with 1,000s of lines (or even bigger) – just use doubling to your advantage! For example, create 10 lines. Then use vi (or other editor) to copy the entire file to itself – now 20 lines. If you remember how a geometric progression goes, you’ll have your 1,000s of lines rather fast:

10 lines…
20 lines…
40 lines…
80 lines…
160 lines…
320 lines…
640 lines…
1280 lines…
2560 lines…
10240 lines…

Ten steps and we’re at 10,000+ lines. In the right editor (vi, emacs, etc.) this could be a macro for even faster doubling. This doubling could also be used at the command line:

cat file.txt >> file.txt

Combined with shell history, that should double nicely – though using an editor would be more efficient (fewer disk reads and writes).

When writing code, often programmers will want to set things off with a line of asterisks, hash marks, dashes, or equals signs. Since I use vi, I like to type in five characters, then copy those five into 10, then copy those 10 and place the result three times. There you have 40 characters just like that.

If only a certain number of characters is needed, use dd:

dd if=/dev/random of=myfile.dat bs=1024 count=10

With this command (and bs=1024), the count is in kilobytes. Thus, the example will create a 10K file. Using the Korn shell, one can use this command to get megabytes:

dd if=/dev/random of=myfile.dat bs=$(( 1024 * 1024 )) count=100

This command will create a 100M file (since bs=1048576 and count=100).

If you want files filled with nulls, just substitute /dev/null for /dev/random in the previous commands.

You could use a word dictionary for words, or a Markov chain for pseudo-prose. In any case, if you only want a certain size, do this:

~/bin/datasource | dd if=- of=myfile.dat bs=$(( 1024 * 1024 )) count=100

This will give you a 100M file of whatever it is your datasource is pumping out.

23 thoughts on “Quickly creating large files”

esofthub says:

23 December 2007 at 9:05 am

Nice tips.

I’m fairly certain you already know this but I would also suggest using the mkfile command

# mkfile 1m 1MEGABYTE

Reply
ddouthitt says:

24 December 2007 at 12:14 pm

The mkfile command is a Linux-specific command; it is not on HP-UX 11i (I checked) and it certain isn’t on OpenVMS 8.3…. it’s also not on my FreeBSD 6.2 system either.

These methods will work anywhere.

Even so, it is a good point – just not portable.

Reply
esofhub says:

25 December 2007 at 5:31 am

It’s part of the Solaris OS but you do make a good point regarding its portability.

Keep posting.

Reply
Lou says:

22 January 2008 at 10:37 am

Annoyingly
cat file.txt >> file.txt
Doesn’t work on my cygwin setup – it’s clever enough to realise that input and ouput are the same.

The:

dd if=/dev/random of=myfile.dat bs=$(( 1024 * 1024 )) count=100

Method worked great though, I created a 600Gb file in (relatively speaking) no time!

Thanks!

Reply
thetechnologyteacher says:

28 February 2008 at 2:22 pm

Thanks for this write up.

One small change: “(since bs=104856 and count=100).” should read 1048576

Cheers

Reply
ddouthitt says:

28 February 2008 at 6:25 pm

Thanks for the comment – and the fix. Done.

Reply
natophonic says:

12 March 2008 at 9:28 am

Generating the file was creeping along verrry slowly, but I found that switching to the non-blocking /dev/urandom

dd if=/dev/urandom of=myfile.dat bs=$(( 1024 * 1024 )) count=100

got me a perhaps slightly less random file in seconds.

http://en.wikipedia.org/wiki/Urandom

Reply
ddouthitt says:

12 March 2008 at 4:00 pm

@natophonic:

As you mentioned, /dev/urandom is nonblocking. The article you pointed to was the article on /dev/random. The article makes for very interesting reading.

Under Linux, /dev/random is designed to be free of cryptographic export controls (by not using ciphers to generate randomness), and may block at times to receive enough entropy from the system.

In contrast, /dev/urandom has a feedback loop where it feeds generated entropy back into itself, and it will not block.

Under FreeBSD, two things are notable: 1) the Linux-style randomness generator was replaced by something called the Yarrow algorithm; and 2) /dev/urandom is linked to /dev/random – put another way, neither /dev/random nor /dev/urandom will block.

Both /dev/random and /dev/urandom are available all over: including Solaris, MacOS X, FreeBSD, HP-UX, Tru64, and AIX to name a few.

In summary, if you require the best random number generator possible – use /dev/random. If you require nonblocking I/O or faster number generation, use /dev/urandom. Thirdly, if you’re using FreeBSD it doesn’t matter which you use.

The article is very interesting: if you are interested in mathematics, you simply must read it.

Reply
1. bogus says:
  
  10 January 2010 at 4:11 am
  
  # uname -a
  Linux testserv 2.6.26-2-686 #1 SMP Wed Nov 4 20:45:37 UTC 2009 i686 GNU/Linux
  
  On GNU/Linux using /dev/random as a source doesn’t appear to work, the generated file size is considerably less that what’s expected:
  # dd if=/dev/random of=random.dat bs=1024 count=1
  0+1 records in
  0+1 records out
  40 bytes (40 B) copied, 8.8709e-05 s, 451 kB/s
  
  Using /dev/urandom does seem to work.
  
  Reply
Upayavira says:

6 April 2008 at 12:46 pm

I think you mean /dev/zero not /dev/null. At least, the former worked for me and the latter didn’t.

Reply
ddouthitt says:

14 April 2008 at 9:00 am

@Upayavira: I do believe you are right. Thanks for pointing out the error!

Reply
Pingback: absentia » Blog Archiv » Nachtschicht
Jenny Cruz says:

4 May 2010 at 9:18 am

very nice replacement for mkfile (when there are times we need to go back retro)..thanks…:)

Reply
Viktor Balogh says:

11 June 2010 at 11:54 am

on hp-ux, mkfile calls prealloc 😉

Reply
Ashok says:

20 October 2010 at 8:50 pm

To create a 10 GB file:

Go to the file system where do you need to create 10GB file and execute the same

# dd if=/var/tmp/test of=file_10GB bs=1m count=1k

Reply
Don T. says:

27 October 2010 at 5:50 pm

The often overlooked power of vi provides an easier way to build your 40 * string. From command mode, just type 40i, then * and ESC back to command mode. And voila, 40 *’s!

(Or 20i, then **, or 10i and ****, and so on. In fact, anything you type will repeat the specified number of times on escape to command mode.)

This trick works for many other vi commands, too. I often use, say, 4yy to yank the next 4 lines, for example, then 10p to insert 10 copies of the 4 lines.

Try it!

Reply
1. ddouthitt says:
  
  11 November 2010 at 6:46 pm
  
  Beautiful!
  
  I know how to use the repeat – but never thought of 40i. Thanks for the tip!
  
  Reply
anon4cec says:

26 July 2011 at 7:51 pm

The command isn’t working for me under Ubuntu 10.04

If I do

dd if=/dev/random of=myfile.dat bs=1024 count=1

it produces a file of 103 bytes. But if I another numbers like

dd if=/dev/random of=myfile.dat bs=1024 count=10

The command simply never finishes. And if it’s cntrl-C’ed… the output file only has a small number of bytes in it.

Reply
anon4cec says:

26 July 2011 at 8:02 pm

I figured it out… And then later noticed the discussion in the feedback. You have to use URANDOM instead of RANDOM. As far as I can tell, however, RANDOM simply never finishes. I let it run for a long time on a 10 kilobyte file and it never finished.

Reply
1. ddouthitt says:
  
  28 July 2011 at 10:04 am
  I don’t believe that the stream from /dev/urandom or from /dev/random ever finishes… For example, if you did this:
```
cat /dev/urandom > /dev/null
```
  This command would never stop.
  Reply

If you are seeking both raw speed and space saving for your file system, on Solaris I suggest mkfile -n.

[root@ilhsf001h001]# df -F nfs -h 
Filesystem             size   used  avail capacity  Mounted on
wfsazlabb15b:/vol/oracle_vm
                       500G   251M   500G     1%    /nas
[root@ilhsf001h001]# time mkfile 1g /nas/file1 
 
real    1m31.641s
user    0m0.354s
sys     0m6.528s
[root@ilhsf001h001]# time mkfile -n 1g /nas/file2
 
real    0m0.016s
user    0m0.002s
sys     0m0.009s
[root@ilhsf001h001]# ls -l /nas
total 2101320
-rw------T   1 nobody   nobody   1073741824 Aug 12 09:31 file1
-rw------T   1 nobody   nobody   1073741824 Aug 12 09:32 file2
[root@ilhsf001h001]# df -F nfs -h 
Filesystem             size   used  avail capacity  Mounted on
wfsazlabb15b:/vol/oracle_vm
                       500G   1.2G   499G     1%    /nas
[root@ilhsf001h001]#

As you can see, it gives both because it doesn’t fill the file at creation time. Refer to the man page for any drawbacks. This command is used extensively in creating virtualization (vdisks).

Pingback: Solaris randomizer | Azwomeninblue
David Dawes says:

5 December 2012 at 2:45 am

That’s really interesting! I’ve been trying to write a script to run a disk wipe process (using Jom Garlick’s scrub utility – on all free space), and I tried to use up most of the freespace first to speed up testing by copying a 2.5GB about 80 times.
A colleague suggested there was a way of creating files without filling them, so I Googled away and found this page. Oddly enough, on my Solaris 10 system, the -n parameter actually slows it down! (from 8-18″ to 20-30″ per GB) …
# time mkfile 1g -n -v 1GBspacehog.00 real 0m39.789s user 0m0.087s sys 0m4.120s # time mkfile 1g -v 1GBspacehog.000 real 0m19.876s user 0m0.058s sys 0m2.691s

Reply

Share this:

Related

23 thoughts on “Quickly creating large files”

Leave a comment Cancel reply