View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0037833||FPC||RTL||public||2020-09-28 21:39||2021-01-19 14:17|
|Reporter||Andrey "Croco" Stolyarov||Assigned To|
|Summary||0037833: Output to text streams other than real disk files goes unbuffered|
|Description||Output to text streams, including stdout, is done without buffering, unless it is redirected to a real disk file. This significantly raises the program's execution time.|
|Steps To Reproduce||To reliably reproduce the effect, one can actually use any program that|
does text output (including the output to stdout) of a notable volume. To
make a simple test, a program that outputs numbers from 0 to 100000000 (100
millions), one per line, will do:
for i:=0 to 100000000 do
100 mils is choosen because this takes measurable amount of time (tens of
seconds), but not too much in order not to fall asleep.
With the version of RTL present in current repositories, the following
measurements are observed. When the output goes to a plain disk file:
crocodil@frock:~/FPC$ time ./mil > file
So it took less than 29 seconds of real (wall clock) time, of which the
kernel had to spend 9.5 seconds. If the same thing is done using a
pipeline, the figures become different (please note the time command is
only applied to ./mil, not to the whole pipeline):
crocodil@frock:~/FPC$ time ./mil | cat > file
The "wall clock" time actually raised less than 2.5 times, but please note
the system time which is more that the "real". This means the poor kernel
had to utilize more than one CPU to serve this process and, roughly
speaking, spent 13 times more time.
This is because for the first run, the output (on the level of syscalls) is
done like this (the following are excerpts from what strace command shows):
write(1, "0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n"..., 256) = 256
write(1, "\n89\n90\n91\n92\n93\n94\n95\n96\n97\n98\n9"..., 256) = 256
write(1, "5\n156\n157\n158\n159\n160\n161\n162\n16"..., 256) = 256
which seems reasonable (altough 256 is not the best buffer size -- well,
4096 is used by most of existing software; but 256 is still acceptable).
But in case the output goes anywhere but a real disk file, the output is
done like this:
write(1, "0\n", 2) = 2
write(1, "1\n", 2) = 2
write(1, "2\n", 2) = 2
write(1, "160\n", 4) = 4
write(1, "161\n", 4) = 4
write(1, "162\n", 4) = 4
write(1, "87337\n", 6) = 6
write(1, "87338\n", 6) = 6
write(1, "87339\n", 6) = 6
This means every writeln within the program causes its own system call
write(2), that is, there's no output buffering in effect. The funniest
thing here is that in case the output is redirected to /dev/null, it works
slower that for a disk file:
crocodil@frock:~/FPC$ time ./mil > /dev/null
This is ONLY because of total number of syscalls; every single syscall in
this case returns almost immediately, because there's nothing to do for it;
but the costs of switching contexts between the process and the kernel are
still in effect.
This all was not the case for the past releases, e.g., if we compile the
same program with fpc 3.2.0, we've got the following figures:
crocodil@frock:~/FPC$ time ./mil320 > file
crocodil@frock:~/FPC$ time ./mil320 | cat > file
crocodil@frock:~/FPC$ time ./mil320 > /dev/null
Please specifically note less than one second system time in the last case.
|Additional Information||It looks like that this misbehaviour was introduced during the work on|
ticket https://bugs.freepascal.org/view.php?id=37716 when the function
Do_IsDevice got changed despite that it had no relationship to the bug
reported in that ticket, and despite that I strolngly suggested not to do
this particular change.
The function Do_IsDevice was not broken, it just was (and still is)
misnamed. It was relatively safe not to touch it at all (as from SeekEof
it was called in error, and now in SeekEof there's no call to Do_IsDevice).
The "fix" for non-broken function lead to the obvious misbehaviour, as only
the streams that go to a human user may go unbuffered (and even these,
actually, should be buffered, but in a manner different from other
streams). Pipes, sockets and devices (like, e.g., /dev/null) are NOT of
this kind, only TTYs are; it is common practice in Unix world to pass
gigabytes of data through pipes and/or through standard input/output
streams, so unbuffering them all is a catastrophe, which can not even be
worked around anyhow from within the pascal program. Effectively this
makes Free Pascal totally unusable for implementing a large kind of
programs known as "unix filters".
Please note that before the "fix", Do_IsDevice was implemented as isatty(3)
-- that is, it issued ioctl(2) syscall to determine whether the stream is
connected to TTY or not.
It is very easy to fix this misbehaviour: roll back the Do_IsDevice
implementation to its previous state, and you're done.
What I also strongly suggest is to rename Do_IsDevice into Do_IsConsole or
the like, so that it will not do more confusion than it already did.
|Tags||No tags attached.|
|Fixed in Revision|
||Note that you can increase buffersize using settextbuf, for applications that you expect to generate a lot of output.|
Marco, I think it is impossible (or useless) to increase any buffer when no buffering is done at all (which is the case I report), and I see/know no way to forcibly switch the buffering on from within the program.
Actually, what really frustrates me here is the "fix" which fixes nothing but breaks correct behavior.
1) The fix/change ensures consistent behaviour among supported targets - the previous state was a terrible mess with each FPC target behaving differently.
2) The fix/change makes the behaviour more compatible to TP/BP - the underlying DOS function used in TP/BP differentiates between files on one hand and all types of devices (regardless from them being related to a console or any other type of device) on the other hand.
3) The fix/change allows e.g. GUI applications to run a FPC-compiled console programs and show output of such console programs as soon as they are available (without having to modify these programs).
Regarding possibility to forcibly switch the buffering on from within the program - this may be achieved easily using:
TextRec(YourTextFile).FlushFunc := nil;
And finally - anybody expecting decent behaviour when processing 'gigabytes of data' (as mentioned in the bug report) while performing I/O on a level of individual characters (even with standard files on a disk) is simply doing something wrong. This would never work reasonably with TP/BP (processing just hundreds of kilobytes using regular string reading / writing line by line already took considerable time by then; I know it very well because I have been using my own unit for exactly these purposes since 1997 after realizing that I couldn't achieve reasonable performance with standard text I/O) and it doesn't work efficiently with FPC either.
Tomas: IMHO such things enabled by default touch the realm of going for a TP emulator rather than a TP compatible compiler. And FPC is a Delphi compatible too, not just TP.
Andrey: I though in case it was a matter of the buffersize set to 1 or so, then setting a buffersize would override it later. Anyway, apparently not, Tomas already answered.
Tomas, once again, your "fix" to the Do_IsDevice function fixes nothing, and I'm tired to repeat this. The detailed explanation was given on the previous ticket, What the "fix" really does is that it breaks the behavior which was relatively correct before. As of "compatibility with MSDOS", it is impossible to be "compatible" with it in dealing with pipes, sockets and other similar streams, as there were no such things in MSDOS. If Free Pascal is declared being cross-platform compiler and not an MSDOS emulator, perhaps it should work correctly on platforms other than MSDOS, such as unix-like systems.
Next, the TextRec is a part of internals, no? That is, the fields of this record are not the part of the officially documented API, am I wrong? If so, then this can hardly considered a solution.
The most interesting thing today is that idea of 'performing I/O on a level of individual characters'. First of all, I never mentioned anything like that. However (second) this might surprize you, but there's absolutely nothing wrong with such an output, provided that correct buffering is done. If you take a look, e.g., inside the glibc implementation, you'll notice that all output created with functions like printf, fputs etc., actually goes through the fputc function which places a single char into the buffer. Actually, all data in the world is built byte-by-byte, or may be word-by-word. The buffering is what protects the poor OS kernel from overwhelming number of syscalls (and, heh, this is exactly what you break with your "fix"). What really matters here is that in Free Pascal, the standard output (that output/stdout thing) is always of type 'text', so, yes, any data that must go to stdout just have to go thru all these text-output facilities. This is nothing wrong, too (no matter how huge are the data to be transferred), as nothing prevents the program from outputting anything byte by byte -- provided that the correct buffering is active.
Marco, I'd appreciate if you take time to read all the discussion on the "related" ticket (and the texts on this ticket, too). I'm a bit surprised with how hard it appeared to explain obvious and well-known things to someone here, and I don't feel powerful enough to repeat everything again.
> TextRec is a part of internals, no? That is, the fields of this record are not the part of the officially documented API, am I wrong?
||Andrey: I'm tired as well, which is the reason why I won't add any further comments here after this one. You didn't mention output of individual characters, but your artificial test program writes between 1 and 9 characters at a time. I can easily show that the performed change improves performance in other cases considerably and that it has completely negligible impact in yet other cases, but that is not the point - the point is that FPC should behave the same way on all supported platforms (which wasn't the case before my change) and that it should provide possibility of working with the output of FPC compiled programs immediately if the output doesn't go to a disk file. Your question about TextRec was already answered by Bart in the meantime, I can just add that TextRec was part of the officially documented API even with TP/BP (Dos.TextRec).|
> but your artificial test program writes between 1 and 9 characters at a time
Well, it does so solely because of your "fix" which is in question here. Normally output to a text stream is not a real output operation, it is an operation of composing data internally withing the program's own memory and costs next to nothing, compared with output syscalls. This is what buffering is for. Funny to see that you accuse my "artificial" program for things you introduces with your broken "fix".
> I can easily show that the performed change improves performance in other cases considerably
No you can't, because it's impossible due to the nature of basic things. If you can, surprize me, heh. Or at least surprise others.
> won't add any further comments here after this one
Okay, it's your right.
Can anyone here tell me how to contact more reasonable people on the Free Pascal team?
||As an exception to my previous statement of not adding further comments: All members of the Free Pascal team received notification about your bug report and moreover, I explicitly mentioned in the core team mailing list that I'm too biased in this case, so that I don't want to make the final decision. I won't comment your statement suggesting who is reasonable and who not, anybody may judge it for himself or herself.|
citing the page on this reference:
> TextRec is the underlying type used in text files. It should be treated as opaque and never manipulated directly.
This is exaclty what I expected, that is, the "solution" offered to me here is not a solution.
I would like to put two cents into this discussion, because for me this bug feels important.
It is my point of view and I do not attaching it to any of the technical terms provided here, but I think when we are talking about "cross-platform" development, we need to ensure that our internal behaviors works the same way as target-platform expects and that means true cross-platform support and if we are moving priority to have same behavior for an interfaces inside our application instead of keeping expected behavior, that leads to fake cross-platform support and users, who expects OS-oriented behavior will encounter troubles.
So in my point of view we need to keep behavior as Croco said and find another way to implement higher level thing what Tomas talking about.
Because operation systems lays much down and their behavior much more important, than upper interface in terms of software for software developers.
p.s Of course it is just my motivation to talk about it and it can be differ in compare with motivations of Croco and Tomas, but maybe my words can change someone's mind.
Thanks for your input. What do you mean by "the way as target platform expects"? None of the FPC supported platforms has any notion of outbut buffering or not buffering as a property of the "platform"; such behaviour is always related to particular API calls (should it be OS API, or RTL, etc.) as far as I can see. Talking about behaviour native for a particular platform (in this case Linux referred to in the report), it is IMHO much more predictible / "native" if a program behaves the same way if sending its output directly to the console and if sending it to the console through the 'tee' filter.
> much more predictible / "native" if a program behaves the same way if sending its output directly to the console and if sending it to the console through the 'tee' filter.
This contradicts with what Unix (not only Linux) people can expect because of well-established unix traditions, so at least this has nothing to do with being "native" for any of the unix-like platforms. That is, no other language, interpreter or compiler (of what I know, and I know a lot of them as actually Linux is the only OS I use for more than 20 years) will demonstrate such behavior.
From the other hand, this (mis)behavior makes FPC useless for programming unix filters as such. Furthermore, as far as I can tell, implementations of that Do_IsDevice thing for various platforms were consistent between each other, as well as with all uses of the function found in the rtl code (after the absolutely nonsense call to Do_IsDevice was finally removed from SeekEof code, which costed me, BTW, way too much time wasted on discussions), in that they did their best to detect whether the output goes to a human user or not. For unix-like systems, this is exactly what isatty(3) function does, and Do_IsDevice versions for unix-like systems were all the same doing exactly this; for other platforms, they actually checked whether the stream is a "standard" one, and when there's no other option, this could be not too bad. What is "inconsistent" here is the name of the function, which definitely must be named differently to prevent more confusion: it should be named Do_IsTerminal or something similar.
Breaking functionality of possible unix-filters written in FPC will gain you nothing, you only loose.
> So in my point of view we need to keep behavior as Croco said and find another way to implement higher level thing what Tomas talking about.
the problem is that these "higher level things" don't exist. Could you may be read the (well, way too long) discussion on the "related" (now closed) ticket, just to get the explanation of what really happens here? BTW, your name looks like Russian, so if you speak Russian, may be you can be so kind to contact me directly (e.g. via my site, http://stolyarov.info) so that we can discuss all this stuff more quickly?
Consistent? Original state (still existing in FIXES_3_2):
AIX, Haiku, Solaris - no buffering for std input/output/error regardless from them being redirected or not, always buffering for all other file handles regardless from them possibly being attached to console or anything else.
BeOS, BSD targets, Linux - no buffering if the file handle was attached to a TTY device
EMX, GO32v2, MSDOS, Watcom - buffering only if the file handle is a disk file (not a device - as the function name suggests and exactly as with TP/BP)
OS/2 - no buffering if the file handle is attached to any device, buffering in all other cases
Win16 - no buffering for std input/output/error (regardless from them being redirected or not) and for all other file handles attached to anything different from a real disk file
Win32/Win64 - no buffering for std input/output/error _unless_ they are redirected, but also no buffering for other handles originally attached to console _even_ if redirected
Current state in trunk:
All above mentioned targets - no buffering for file handles attached to any device, pipe or socket
>Win32/Win64 - no buffering for std input/output/error _unless_ they are redirected, but also no buffering for other handles originally attached to console _even_ if redirected
Then the redirection is failing, I've been waiting on a redirection for several /minutes/ now with one core at full CPU, filesize is only a few hundreds of kbs. It is the redirection of fpdoc as -gh compiled file, piping with
fpdoc arguments 1> log.txt 2>&1
||@Marco: Is this trunk, or 3.2.x? In other words - is it supposedly related to this bug report (it should not be), or is it some issue related to the original implementation described in my comment (or some other problem)?|
|2020-09-28 21:39||Andrey "Croco" Stolyarov||New Issue|
|2021-01-02 23:00||Tomas Hajny||Relationship added||related to 0037716|
|2021-01-03 14:49||Marco van de Voort||Note Added: 0128046|
|2021-01-04 08:22||Andrey "Croco" Stolyarov||Note Added: 0128063|
|2021-01-04 10:05||Tomas Hajny||Note Added: 0128064|
|2021-01-04 10:06||Tomas Hajny||Note Edited: 0128064||View Revisions|
|2021-01-04 12:22||Marco van de Voort||Note Added: 0128067|
|2021-01-04 19:34||Andrey "Croco" Stolyarov||Note Added: 0128082|
|2021-01-04 22:11||Bart Broersma||Note Added: 0128084|
|2021-01-05 00:03||Tomas Hajny||Note Added: 0128086|
|2021-01-07 16:23||Andrey "Croco" Stolyarov||Note Added: 0128142|
|2021-01-07 16:36||Tomas Hajny||Note Added: 0128144|
|2021-01-09 07:58||Andrey "Croco" Stolyarov||Note Added: 0128191|
|2021-01-11 15:21||Alexander Logunov||Note Added: 0128266|
|2021-01-11 15:58||Tomas Hajny||Note Added: 0128268|
|2021-01-11 15:59||Tomas Hajny||Note Edited: 0128268||View Revisions|
|2021-01-16 22:59||Andrey "Croco" Stolyarov||Note Added: 0128382|
|2021-01-16 23:05||Andrey "Croco" Stolyarov||Note Added: 0128383|
|2021-01-17 01:33||Tomas Hajny||Note Added: 0128384|
|2021-01-19 11:52||Marco van de Voort||Note Added: 0128406|
|2021-01-19 14:17||Tomas Hajny||Note Added: 0128415|