View Issue Details

IDProjectCategoryView StatusLast Update
0037833FPCRTLpublic2020-09-29 00:38
ReporterAndrey "Croco" Stolyarov Assigned To 
Status newResolutionopen 
Product Version3.3.1 
Summary0037833: Output to text streams other than real disk files goes unbuffered
DescriptionOutput to text streams, including stdout, is done without buffering, unless it is redirected to a real disk file. This significantly raises the program's execution time.
Steps To ReproduceTo reliably reproduce the effect, one can actually use any program that
does text output (including the output to stdout) of a notable volume. To
make a simple test, a program that outputs numbers from 0 to 100000000 (100
millions), one per line, will do:

program million;
    i: longint;
    for i:=0 to 100000000 do

100 mils is choosen because this takes measurable amount of time (tens of
seconds), but not too much in order not to fall asleep.

With the version of RTL present in current repositories, the following
measurements are observed. When the output goes to a plain disk file:

crocodil@frock:~/FPC$ time ./mil > file

real 0m28.787s
user 0m18.256s
sys 0m9.532s

So it took less than 29 seconds of real (wall clock) time, of which the
kernel had to spend 9.5 seconds. If the same thing is done using a
pipeline, the figures become different (please note the time command is
only applied to ./mil, not to the whole pipeline):

crocodil@frock:~/FPC$ time ./mil | cat > file

real 1m15.093s
user 0m26.912s
sys 2m0.156s

The "wall clock" time actually raised less than 2.5 times, but please note
the system time which is more that the "real". This means the poor kernel
had to utilize more than one CPU to serve this process and, roughly
speaking, spent 13 times more time.

This is because for the first run, the output (on the level of syscalls) is
done like this (the following are excerpts from what strace command shows):

write(1, "0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n"..., 256) = 256
write(1, "\n89\n90\n91\n92\n93\n94\n95\n96\n97\n98\n9"..., 256) = 256
write(1, "5\n156\n157\n158\n159\n160\n161\n162\n16"..., 256) = 256

which seems reasonable (altough 256 is not the best buffer size -- well,
4096 is used by most of existing software; but 256 is still acceptable).
But in case the output goes anywhere but a real disk file, the output is
done like this:

write(1, "0\n", 2) = 2
write(1, "1\n", 2) = 2
write(1, "2\n", 2) = 2
write(1, "160\n", 4) = 4
write(1, "161\n", 4) = 4
write(1, "162\n", 4) = 4
write(1, "87337\n", 6) = 6
write(1, "87338\n", 6) = 6
write(1, "87339\n", 6) = 6

This means every writeln within the program causes its own system call
write(2), that is, there's no output buffering in effect. The funniest
thing here is that in case the output is redirected to /dev/null, it works
slower that for a disk file:

crocodil@frock:~/FPC$ time ./mil > /dev/null

real 0m38.543s
user 0m20.356s
sys 0m18.184s

This is ONLY because of total number of syscalls; every single syscall in
this case returns almost immediately, because there's nothing to do for it;
but the costs of switching contexts between the process and the kernel are
still in effect.

This all was not the case for the past releases, e.g., if we compile the
same program with fpc 3.2.0, we've got the following figures:

crocodil@frock:~/FPC$ time ./mil320 > file

real 0m31.800s
user 0m21.064s
sys 0m10.136s

crocodil@frock:~/FPC$ time ./mil320 | cat > file

real 0m21.423s
user 0m16.672s
sys 0m19.272s

crocodil@frock:~/FPC$ time ./mil320 > /dev/null

real 0m17.139s
user 0m16.592s
sys 0m0.544s

Please specifically note less than one second system time in the last case.
Additional InformationIt looks like that this misbehaviour was introduced during the work on
ticket when the function
Do_IsDevice got changed despite that it had no relationship to the bug
reported in that ticket, and despite that I strolngly suggested not to do
this particular change.

The function Do_IsDevice was not broken, it just was (and still is)
misnamed. It was relatively safe not to touch it at all (as from SeekEof
it was called in error, and now in SeekEof there's no call to Do_IsDevice).
The "fix" for non-broken function lead to the obvious misbehaviour, as only
the streams that go to a human user may go unbuffered (and even these,
actually, should be buffered, but in a manner different from other
streams). Pipes, sockets and devices (like, e.g., /dev/null) are NOT of
this kind, only TTYs are; it is common practice in Unix world to pass
gigabytes of data through pipes and/or through standard input/output
streams, so unbuffering them all is a catastrophe, which can not even be
worked around anyhow from within the pascal program. Effectively this
makes Free Pascal totally unusable for implementing a large kind of
programs known as "unix filters".

Please note that before the "fix", Do_IsDevice was implemented as isatty(3)
-- that is, it issued ioctl(2) syscall to determine whether the stream is
connected to TTY or not.

It is very easy to fix this misbehaviour: roll back the Do_IsDevice
implementation to its previous state, and you're done.

What I also strongly suggest is to rename Do_IsDevice into Do_IsConsole or
the like, so that it will not do more confusion than it already did.
TagsNo tags attached.
Fixed in Revision
Attached Files


There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2020-09-28 21:39 Andrey "Croco" Stolyarov New Issue