View Issue Details

IDProjectCategoryView StatusLast Update
0027670FPCCompilerpublic2015-09-27 15:34
ReporterUlrich Thome Assigned ToMarco van de Voort  
PrioritynormalSeverityminorReproducibilityrandom
Status resolvedResolutionunable to reproduce 
Platformx86/x64OSWindows Server 
Product Version2.6.4 
Summary0027670: Append sometimes overwrites file with garbage
DescriptionTo tell the story I have to start in 2010. I wrote a program to log data from my solar inverter and write it to a text file. The program was complied using fpc 2.4.0 on a windows 2000 computer. The resulting exe file was run on a diferent machine. It was scheduled by the windows scheduler to start once a day and add a line to a text file. It ran flawlessly for 4 years on WHS 2001 (i.e. windows Server 2003) and subsequently for 1 year on a WHS 2011 (i.e. Server 2008 R2).
At that time, I recompiled the program using fpc 2.6.4 on a win7-64 AMD Phenom II machine because I wanted to change the drive letter of the path where the log file is supposed to be written to. Surprisingly, the exe became 100.000 bytes longer. (419.000 bytes instead of 319.000 bytes). When using this program, for sevaral weeks, the log file became repeatedly overwritten by garbage, leding to data loss. It worked well for a while and suddenly there was only garbage left, with some correct readings at the end of the file, and the file maintained the correct length. Another log file simultaneously written to a network drive was not harmed. As a hot fix, I went back to the old exe file and changed the drive letter using a hex editor to avoid recompiling. With the old exe, the issue was gone.

More recently, I wrote another similar program to log data from an electric meter. I compiled it with fpc 2.6.4 and deployed it. After 1 week of normal functioning. I found again the respective log file full of garbage. An example is attached.

Now I got frustrated and uninstalled fpc 2.6.4. Instead, I went back to fpc 2.4.0, installed it on the Windows 7 machine used for development, recompiled both programs and deployed them on the home server. The problem has not recurred yet, hopefully it is gone. Furthermore, the fpc 2.4.0 compiled programs are much smaller than the ones compiled with fpc 2.6.4.
Steps To ReproduceRunning the program every day by the windows scheduler occasionally leads to a corrupted log file on the internal drive pool (v:\). The programs are set to write the file twice, the other one, going to the network drive (y:\), remains unharmed. It is unclear when the file corruption happens and what causes it. It is also unclear why it does not happen every day. I suspect it is related to the code generated by fpc 2.6.4 rather than 2.4.0, and maybe interference with the special environment the programs are deployed in (see below).
Additional InformationThe computer running the compiled programs is an AMD N54L with ECC RAM, running WHS2011 alias Windows Server 2008 R2. Furthermore, transparent raid by Flexraid is installed, and the log files are being written on the pooled drive created by Flexraid.

By the way, If you think my programming is funny, don't worry. My real profession is medicine. I am programming only small things to automate things in my home and make my life a bit easier.
TagsNo tags attached.
Fixed in Revision
FPCOldBugId
FPCTarget
Attached Files

Activities

Ulrich Thome

2015-03-14 16:46

reporter  

bug_report_files.ZIP (512,904 bytes)

Thaddy de Koning

2015-03-14 17:04

reporter   ~0081955

Last edited: 2015-03-14 17:28

View 6 revisions

This belongs on the forum, frankly. But I'll have a look.

[edit]
mt174_1.pp compiles to 198,656 bytes with fpc 3.1.1
si5000.pp compiles to 196.608 bytes with fpc 3.1.1

I guess you need to read some more optimization documention.

Compiled with (given your paths are correct) ppc386 -CX -XX -Xs -Os -O4 <your program>

Since I do not have your hardware available, I can't test its working.

I suggest you ask to close this. There is a dependency on synapse (synaser) that is likely an issue. Although synaser trunk is highly stable. Are you maybe using an old version of the synaser code? That would be like practising medicin before penicillin was discovered ;)

Software bugs have always been equivalent to bugs in the medical sense.

Marco van de Voort

2015-03-14 18:22

manager   ~0081958

Start by add a mutex so that there are never two instances concurrent

Bart Broersma

2015-03-14 18:51

reporter   ~0081961

Last edited: 2015-03-14 18:54

View 2 revisions

You could try to isolate the problem by removing the depency from synaser and just write dummy stuff to the files.
If the problem disappears the bug is in the code using synaser (or synaser itself).

This looks funny to me (my real profession is medicine too):

  assign(ablage,logdatei);
  {$I-} append(ablage); {$I+}
  if ioresult <> 0 then begin
    rewrite(ablage);

If append fails for any other reason than the file not being there, it will be overwritten (if rewrite succeeds). This might not be the behaviour you intended.

To me more logical would be:

  {$I-}
  if not fileexists(logdatei) then
    rewrite(ablage)
  else
    append(ablage)
  {$I+}
  if ioresult <> 0 then
  begin
    //optionally show or log some errormessage and then exit this procedure
    exit;
  end;

Ulrich Thome

2015-03-14 21:39

reporter   ~0081968

Last edited: 2015-03-14 21:47

View 3 revisions

Hi All,
Thank you very much for your quick responses to my questions. Thanks also to Bart for his coding suggestions. I will use them.
The theory that synaser is the culprit, however, does not explain the following two observations:
First, it is not he latest line of the data set that is overwritten by garbage, but rather, older parts that had been created days before and were definitely still existing and intact the day before. Also, the last characters of the garbage file contain real data. Interestingly, the file length is exactly correct even after the corruption occurred. It looks to me as if the whole file gets loaded into a buffer and then written back to the disk when closing, and sometimes, the buffer contents get replaced by the garbage.
Second, my program writes two log files on different locations to be sure that the data survives. If bad data would be coming via synaser from the serial port, the bad data would go to both files, but it does not. Only the local file gets corrupted.
For these reasons, I suspected that a bug might have been introduced into the file buffer handling of programs compiled with later versions of FPC. therefore, I contacted you. I am afraid, I did not know better. You know better what changes to the FPC code have been made. If this idea cannot be true, well, I have to re-think what else might be happening.
Thanks a lot for all your insights.

Bart Broersma

2015-03-15 12:46

reporter   ~0081984

Please report back what happens tot the test without synaser.

Ulrich Thome

2015-03-18 22:23

reporter   ~0082100

Hi,
I have just created a program that does not use synaser but writes a similar file as si5000, just putting down dummy data, the same every time. I have compiled it with fpc 2.6.4. I have deployed it onto the server and scheduled it to run every day, like the logging programs. Now I will have to watch out for the sporadic malfunctions I described before. I will let you know when something shows up.
By the way, the real logging programs work fine for a while now, since I have recompiled them with fpc 2.4.0. they already lasted longer than the fpc 2.6.4 versions. I am also watching them daily, of course.
Regards
Ulrich

Ulrich Thome

2015-03-21 15:54

reporter  

bug_report_files_2.ZIP (289,013 bytes)

Ulrich Thome

2015-03-21 16:14

reporter   ~0082164

Hi,
the trap cought it today! the test program has indeed overwritten the dummy log file with garbage data, with a line of real data at the end. Yesterday, the log file, built up over several days, still looked fine. What was overwritten was the older part that contained good data until yesterday. the last line shows real data, and the file length is correct. See attached file testdatei_v.txt. The issue affected only the file on the local drive, the version on the network drive was updated correctly. See testdatei_y.txt which has the same length.
The program that wrote these files was the test program derived from si5000 by removing all synaser stuff and then compiled using fpc 2.6.4.
At the same time, my real logging programs, si5000.exe and mt174.exe, which were compiled using fpc 2.4.0, worked fine and logged good data on the local and network drives, as they always did since I had recompiled them with fpc 2.4.0 on Mach 6th. Before that date, versions compiled with fpc 2.6.4 also overwrote the log files with garbage occasionally.
Therefore, the problem must have something to do with the newer version 2.6.4 of fpc, and not with synaser. I am puzzled, but that is the only remaining conclusion.
All files are attached.

Ulrich Thome

2015-03-29 22:19

reporter   ~0082447

Hi All,
It seems to me that this report does not get any attention any more. Please be so kind and look at my above reports and the attached files. I show you that fpc 2.6.4 compiled programs may operwrite files with garbage when the append command is used. It also happens without synaser! This never happened with fpc 2.4.0 which I used before.

I do think that fpc is a great project and extremely useful, and maybe I should not complain beacuse it is free. Therefore, I apologize. However, I do believe that this issue is important because I believe that nobody wants corrupted files. For me, I must consider fpc 2.6.4 beta and will have stick to 2.4.0 when handling important data files.

Sincerely,
Ulrich

Mark Morgan Lloyd

2015-03-29 22:45

reporter   ~0082449

I do a lot of serial comms stuff and related logging, and so far have no reason to believe that FPC's file handling is deficient. I admit to not using Synaser and I suppose that there's a possibility that it could be screwing the runtimes in some way, but I consider that pretty unlikely.

What calls that program, and how do you know that only one copy is running at any given moment?

Marco van de Voort

2015-03-30 14:54

manager   ~0082464

Since you schedule these programs, did you add a mutex to make absolutely sure there are never two instances of the file writing to the same logfile?
(comment http://bugs.freepascal.org/view.php?id=27670#c81958)

Currently there is simply no real lead where to begin, except the fairly meagre observation that the old program behaved different from the new one. (which can be due to coincidence factors)

Thaddy de Koning

2015-03-30 17:22

reporter   ~0082471

Last edited: 2015-03-30 17:25

View 3 revisions

@Ulrich: as Marco writes, we need more input. I can say - I also use synapse in real life (some people know the scale of them) applications - there is nothing I can see that is wrong with them or with FPC with regards to the subject.
We never have seen a case like yours for over 6 years....
Can you try and point out exactly what goes wrong in a repeatable simple example.

Can you also check that write permissions are mutually exlusive? (under windows that is much simpler than a mutex, it's a flag) and read permissions do not block?

Ulrich Thome

2015-04-01 22:47

reporter   ~0082539

Hi all,
thanks for all your insights. i am going to try to give you the requested details:
@Mark: the problem occurred also when all the serial and synaser stuff was removed and the program was set to generate and write dummy data instead of getting the data from the serial port. The programm is called by the Windows Scheduler built into Vista alias Home Server 2011.

@Marco: I did not add a mutex, and I have to admit that I do not know what that is. However, the program is scheduled to run once a day, and it usually finishes and terminates after a few seconds. The Windows scheduler calling it is set to terminate it after 3 hours if it is still running. Therefore, I believe that two instances running simultaneously cannot happen, because the program is long finished or terminated before the next scheduled run is due.

@Thaddy: As I said, the garbage data occurred also when there was no synapse etc. included in the test program. Variables were just filled with constants and written to a file every day, and one day, all was replaced by total garbage. I believe that this error is probably rare and did not occur to you, however, it left me upset because I lost real and unrecoverable data on several occasions because of that, and I suddenly had to start paying attention to a process that had not been a problem for several years. After several occurrences of data loss, all became reliable again after recompiling with the older compiler version. Yes, this is indeed strange. I am now having difficulties to find conditions that repeat it 100%, it has always been a sporadic error. Currently, I am suspecting that the garbage might occur when the file to be written to has a certain size. I am currently trying to sort this out. I am running a test program by the scheduler and watch for garbage in the file that is being created and slowly growing.

Unfortunately my progress is slow because of my demanding job on a neonatal intensive care unit. But I will do my best to give you the most detailed information I can.

Cyrax

2015-04-02 02:08

reporter   ~0082545

Do you have any global string variables? Also are you using any optimization switches? If you are, remove and disable any optimizations you might use.

Bart Broersma

2015-04-04 16:04

reporter   ~0082586

schreiben('v:\shares\Solar\testdatei.txt');
  schreiben('\\FRITZ-NAS\FRITZ.NAS\General-USBFlashDisk-01\solar\testdatei.txt');
  schreiben('k:\testdatei.txt');

Are you absolutely sure that none of these filenames map to the same file (at any time)?

Bart Broersma

2015-04-04 23:42

reporter   ~0082598

Last edited: 2015-04-04 23:44

View 5 revisions

See attached file append-test.zip.

I have simplified the test_1.pp program a bit.
(I modified the output a little so that it can be examined a little easier in notepad (used fixed format instead of tab separated file).)
I have made a separate program that cehcks the integrity of the 'testdatei.txt' file.
(I used the enclosed gappendtest project for testing)

I have run in access of 7500 consecutive appends and checks, all without any failure (fpc 2.6.4 on win32).

Can you adapt your test setup to run appendbug and appendcheck (compile with 2.6.4 please) to see if you can reproduce the error?
You need to ater the TestFilename constant in appenddefs.pp.

If you can reproduce this on drive v:, then please also check if you can reproduce it on another (not shared) local drive.

Q: the V: drive, is it a network share and if so, how is the filesystem mounted (NFS, SMB)?

Ulrich Thome

2015-04-05 17:54

reporter  

fehler150405.zip (1,052 bytes)

Ulrich Thome

2015-04-05 18:04

reporter   ~0082630

Hi Bart, Hi All,

I also was unable to reproduce the error when hand-starting the program, but today, I witnessed another occasion of the error after the automated start. I have attached the files, before todays write attempt, and after, with the older part of the file overwritten by garbage.

Thank you for your test programs, I will try them

Regarding your questions:
I am absolutely sure that the three write locations are never the same. y:\ is a network share on the dsl router. k:\ is a local usb stick which is actually rarely used, the write fails when the stick is not present, but its the last write, the others are already complete when that happens.
v:\ is a local ntfs-formatted drive pool created by Flexraid's Transparent Raid from the local drives e:\ and f:\. Another drive without a letter is used to store parity information to recover data stored on e:\ and f:\ in case of a drive failure. The local drives c:\ and d:\ also exist in the system but are not part of the RAID.

Ulrich Thome

2015-04-05 18:36

reporter   ~0082635

Hi Bart,
I am afraid but I am not quite sure how to use your test programs. There is no appenddefs.pp file in the zip archive, and there is a main.pp which you do not mention. Furthermore, there is appendcheck.lpr - do I have to install Lazarus to compile it?
Furthermore, I did not find a constant definition of TestFilename which I am supposed to modify.

Bart Broersma

2015-04-05 23:40

reporter  

append-test2.zip (69,732 bytes)

Bart Broersma

2015-04-05 23:56

reporter   ~0082644

Last edited: 2015-04-05 23:57

View 2 revisions

Sorry, first append-test.zip was incomplete, can someone delete it?
Attched append-test2.zip.

appenddefs.pp - common unit for appendcheck and appendbug program
appendcheck.lpr - appendcheck program (does NOT need Lazarus)
appendbug.pp - appendbug program (does NOT need Lazarus)

gappendtest.lpi - Lazarus graphical program that runs appendbug/appendcheck
gappendtest.lpr
main.lfm
main.pp
gappendtest.ico

appendbug is the program that updates the 'testdatei.txt' file.
appedcheck is the program that checks the integrity of 'testdatei.txt'.
You can compile both regularly with fpc.
The TestFilename const is in appenddefs.pp

I have used the gappendtest program to automatically run cycles of appendbug + appendcheck, but you can achieve the same with a batch job (just end the job if errorlevel is not 0)

(typed from old DOS memory)

@echo off
:start
path\to\appendbug.exe
REM checks for any errolevel > 0
if errorlevel 1 goto quit

path\to\apendcheck.exe
if errorlevel 1 goto quit
goto start

:quit
echo Error detected!
pause

Ulrich Thome

2015-04-06 15:21

reporter   ~0082661

Last edited: 2015-04-06 15:29

View 5 revisions

Hi Bart,
I have put your programs to test, although there was a little problem that prevented them from working properly at the first attempt. The length of the date was 10, and therefore, the next item, Tag (1), was stuck right after the date (06.04.2015) without a space in between, resulting in 06.04.20151. This led to an immediate failure report of appendcheck, because it found only 21 instead of 22 items, because it was reading 06.04.20151 as one item.
I have fixed that problem by inserting an additional line into appenddefs (see attachment, line 92). Now your programs are running - so far error free. If no problem shows up, I will let appendbug run by the scheduler, once daily. Usually, we should see an error within one week.
sincerely

Ulrich Thome

2015-04-06 15:22

reporter  

appenddefs.pp (3,704 bytes)

Bart Broersma

2015-04-06 18:57

reporter   ~0082676

Success whith testing

{
I counted wrong (due to testing on day < 10 and my formatsettings).
You could have just changed
- while Length(Result) < 10 do Result := Result + 0000032;
to
- while Length(Result) < 12 do Result := Result + 0000032;
}

Ulrich Thome

2015-04-07 23:29

reporter  

testfiles150407.zip (736 bytes)

Ulrich Thome

2015-04-07 23:43

reporter   ~0082705

Hi Bart,

Running appendbug/appendcheck by the suggested batch job yesterday produced no errors during half an hour of continuous repetitions.

Thereafter, I deleted testdatei, ran appendbug once more to re-create it an then put appendbug on the scheduler, to be run once a day. Today, testdatei was already screwed up. I have uploaded the files "before" and "after" for your review.
Please note: the program si5000, compiled with fpc 2.4.0, ran by the scheduler at about the same time, and logged the data correctly into its log file. Strange.

The last occasions of this error always occurred when the log file size exceeded 512 bytes for the first time. Not before, not thereafter. Does that make sense? Is there a buffer of 512 bytes, and things go awry when the buffer size is exceeded during the write?
I am just speculating.

jamie philbrook

2015-04-08 01:53

reporter   ~0082707

it maybe unrelated however, years ago with Turbo Pascal I do remember issues
with file handling..
  
 Can you try doing this prior to any file handling.
 
 InOutRes := 0;

 Also, please test the value of the file handle to make sure it is not
in the range of the input/output handles or make sure it does not match
those handles.
 Something is telling there was an issue with that once before.

 I have seen the Sector size errors like that before and it was usually caused by a pre-exiting file error that was still hanging around.

 I would need to examine the RTL but it's possible something is getting over looked.

Jeppe Johansen

2015-04-08 12:36

developer   ~0082711

That looks like an OpenDocument header. Did you open the file with openoffice between calling the programs, or create the file as a OpenDocument file?

Thaddy de Koning

2015-04-08 14:09

reporter   ~0082712

Last edited: 2015-04-08 15:14

View 3 revisions

I can confirm this indeed looks like an opendocument header.
The default to save with openoffice or libreoffice is in odt format.

If confirmed, please use something like notepad or geany or explicitly save as txt.
[edit]
In short, if this is the case, use a programmers editor to examine the output, not a wordprocessor. You can also use fp or lazarus, of course

Ulrich Thome

2015-04-08 21:39

reporter   ~0082722

Last edited: 2015-04-08 21:40

View 2 revisions

Open document header? This is interesting, but leaves me wondering where it comes from. No, the files were never touched by Open/Libre Office.

OpenOffice and Liberoffice are not even installed on the server, where the Pascal programs run. It is really a plain server, which is never used for anything else.

Opening of the log files to check their contents is only done with a plain text editor.

The server does store other open document files in other folders of the v:\pool drive which are shared over the network. However, the folder holding the log files is only shared with my own account.

The only explanation I could think of is a buffer overrun problem which leads to open office code from other files being erroneously written into the text files.

Today, there were no errors. Will check again tomorrow.

Thaddy de Koning

2015-04-08 22:12

reporter   ~0082724

Last edited: 2015-04-08 22:13

View 2 revisions

In the case Bart's test fails, *what are the file permissions?*
That's the only thing I can think of now. This may be related to other software running than just fpc compiled code.

What I see is really a odt header.

Ulrich Thome

2015-04-12 22:17

reporter  

testdatei150412.txt (666 bytes)   
PKm��F^�2''mimetypeapplication/vnd.oasis.opendocument.textPKm��FConfigurations2/floater/PKm��F'Configurations2/accelerator/current.xmlPKPKm��FConfigurations2/images/Bitmaps/PKm��FConfigurations2/progressbar/PKm��FConfigurations2/menubar/PKm��FConfigurations2/popupme12.04.2015 7        13:17:00  NO_METER  1204      1191      1091      1087      2230      2274      35        36        1922       ?        97         ?        612        ?        612        ?        217235     ?        
testdatei150412.txt (666 bytes)   

Ulrich Thome

2015-04-12 22:38

reporter   ~0082854

It happened again today. Test file overwritten by the odt header (see upload). But not the other real log files. Note: the program writing to the test file was compiled with fpc 2.6.4, the other programs writing to the real lotg files with 2.4.0. It still hold true that I have 100% reliability with fpc 2.4.0 compilates.

At the time when the malfunction happened, my son was working on one of the client computers with open office and edited a file that was stored on a network share of the server. The pascal programs run on the server. This is the only connection to odt I can see.

To test this further, I ran Bart's test as a batch for several minutes. At the some time, I opened open office on a client and opened, edited and closed odt files sitting on the server. No errors.

There is no open office installed on the server.

I am still puzzled that it still seems to be related to the compiler version (2.6.4 vs. 2.4.0). Furthermore, I am puszzled that I con only see the error when the programs are started through the "Aufgabenplanung" of Windows, not when I am logged on and do it by hand.

Now, I have deleted all log files from the server and will have all pascal programs start them from scratch. We'll see which ones produce errors.

Cyrax

2015-04-12 22:54

reporter   ~0082855

Looks like that something is amiss with OS file cache functionality over network share. Do you have checked that you have all important updates applied to your Windows Server OS? Or to your host OS?

Maybe forcing file cache flush can help in this matter?

Is that Flexraid a software RAID? Maybe updating that would help, too?

Mark Morgan Lloyd

2015-04-13 00:04

reporter   ~0082863

It reminds me of the "opportunistic locking" that used to cause things like Paradox so much grief on Windows.

Ulrich Thome

2015-04-14 22:26

reporter  

testdatei150414.txt (666 bytes)   
���JFIF``��C


		
%# , #&')*)-0-(0%()(��C



(((((((((((((((((((((((((((((((((((((((((((((((((((��KK"��	
���}!1AQa"q2���#B��R��$3br�	
%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz������������������������������������������������������������������������	
���14.04.2015 2        13:17:00  NO_METER  1204      1191      1091      1087      2230      2274      35        36        1922       ?        97         ?        612        ?        612        ?        217235     ?        
testdatei150414.txt (666 bytes)   

Ulrich Thome

2015-04-14 22:27

reporter  

soldatei1.txt (338 bytes)   
Datum	Tag	Zeit	ID	S1Pm	S1Ps	S2Pm	S2Ps	ACPm	ACPs	DCTemp	ACtemp	ETag_Wh	dtag	EWoch_hWh	dwoch	EMon_hWh	dmon	EJahr_hWh	djahr	Eges_hWh	dges
13.04.2015	1	13:14:01	SI 5000 DE	1509	1587	1355	1413	2787	2981	47	49	9633		96		1962		8492		225115	
14.04.2015	2	13:14:01	SI 5000 DE	1325	1358	1185	1210	2465	2562	48	51	12286		287		2153		8683		225306	
soldatei1.txt (338 bytes)   

Ulrich Thome

2015-04-14 22:27

reporter  

stromdat.txt (312 bytes)   
Datum	Tag	Zeit	Haelfte	Verbrauch	d_v	Einspeisung	d_e	Datum	Tag	Zeit	Haelfte	Verbrauch	d_v	Einspeisung	d_e	
13.04.2015	1	07:10:00	1	   2997.160		    775.607		13.04.2015	1	21:10:01	2	   3001.867		    797.189		
14.04.2015	2	07:10:01	1	   3006.363		    797.253		14.04.2015	2	21:10:00	2	   3012.709		    826.528		
stromdat.txt (312 bytes)   

Ulrich Thome

2015-04-14 22:42

reporter   ~0082905

It happened again today. The fpc 2.6.4 compiled test program wrote garbage to the test file, the two other logging programs, compiled with fpc 2.4.0, one of them running at about the same time, worked fine.

I have reproduced these malfunctions numerous times now, but only with fpc 2.6.4 compiled programs.

Of course, I am using the newest version of flexraid's t-raid and all updates are installed on the server and clients. If not, it would still not explain why programs compiled with fpc 2.4.0 are not affected.

Since I have not written fpc and have only a very crude idea of how to program a compiler, I have no idea what is going wrong. I am only getting more and more convinced that this issue is really related to the compiler version. I hope that somebody may identify the problem, because otherwise, future versions may also be affected.

About Flexraid's t-raid: Flexraid's t-raid takes one of the drives in the server to store parity information derived from the data drives. This drive is not formatted, parity information is stored sector by sector. The parity information allows restoring a data drive if it fails. The data drives, however, are not changed. You can take any of the data drives out of the system, hook it up to any computer and read the data as with any ordinary drive. For more information see flexraid.com

Cyrax

2015-04-15 03:59

reporter   ~0082921

Last edited: 2015-04-15 04:17

View 2 revisions

Perhaps changing "{$MODE DELPHI}" to "{$MODE OBJFPC}{$H+}" gives different behavior.

Maybe the windows scheduler treats your console program as an service program which does have different rules than ordinary console ones.

Ulrich Thome

2015-04-16 23:07

reporter  

testdatei150416.txt (666 bytes)   
Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate]
 "WUServer"="http://Figaro:8530"
 "WUStatusServer"="http://Figaro:8530"
 "ElevateNonAdmins"=dword:00000001
 "TargetGroupEnabled"=dword:00000001
 "TargetGroup"="manupdate"
 "AcceptTrustedPublisherCerts"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU]  
 "NoAUShutdownOption"=dword:0016.04.2015 4        13:17:00  NO_METER  1204      1191      1091      1087      2230      2274      35        36        1922       ?        97         ?        612        ?        612        ?        217235     ?        
testdatei150416.txt (666 bytes)   

Ulrich Thome

2015-04-16 23:16

reporter   ~0082981

Hi again,

i cqan report another garbage file, it is attached. This time, the garbage seems to be derived from a registry file that had beed edited the day before.

Note: The logging programs compiled with fpc 2.4.0 performed well today. They logged the data as they should. Not the test program compiled with fpc 2.6.4, it produced a corrupt file. This behaviour was also observed when the logging programs were compiled with fpc 2.6.4. It is really the compiler version that makes a difference,so I would suggest that somebody who understands the compiler function has a look. There is nothing wrong with my computer, otherwise, I should see garbage also in other files and also with the fpc 2.4.0 compiled loggers, which has never been the case.

What is interesting. The test program creates a file with a length of 444 bytes. The next day, it appends 222 bytes to a total length of 666 bytes. This is when the error happens and the older part of the file is overwritten by something else. Appending data to a longer file goes error free. this seems to be related to a 512 bytes buffer somewhere???

Cyrax

2015-04-16 23:29

reporter   ~0082982

Can you use FPC trunk (3.1.1) for testing? 2.6.4 is no more in active development phase.

I'm starting to believe that Flexraid (it seems that it does not flush system/its own file cache) somehow interferes with your logging program. If it is possible, please disable it for time being.

Also you could try to schedule this little utility program https://technet.microsoft.com/en-us/sysinternals/bb897438 to run before your logging program. It will force system to flush its file cache to disk.

Stephano

2015-04-17 08:42

developer   ~0082988

A fact that might be related/helpful:
Just attempting to open a 0 byte file with LibreOffice Calc 3.3.4 (and aborting at the filter selection dialogue) will overwrite the file with 1536 bytes.

I have no idea if this happens as well if the file is locked in some odd way that Calc cannot read it but can write to it.

Thaddy de Koning

2015-04-17 14:03

reporter   ~0083000

Last edited: 2015-04-17 14:15

View 3 revisions

Is your configuration in *any* way compromized by lesser known or dinosaur security solutions,? It looks like something is hooking into file IO without cleaning up properly. Plz check this scenario. I once - long time ago - made "cleaver" code that did the same irresponsible thing and it is still endemic.
I especially recommend looking for mcafee and norton because with these almost everything is a hook into something. More modern engines do their job mostly quite different. (AVG, Kasperski, the German trio)

My suspicion is it might happen in user space (ring 3), not driver space. It can however happen in driver space (ring 0) too.
Did you do a rootkit check?

Ulrich Thome

2015-05-02 13:57

reporter   ~0083393

Hi,

after some time of observing and testing, I am back. I have found multiple more occasions of corrupted files. Log files are obviously only in danger when they are small, when they have grown past a certain limit they are somehow protected.

Also, I have repeatedly erased the other log files, which are written to by the fpc 2.4.0 log programs, to get them re-started from a small size. On two occasions, they were also corrupted, just not as frequently as the one done by the test program (which is corrupted every time when the size increases from 444 to 666 bytes. As the other log files are written to by fpc 2.4.0 compiled programs, I have to state that using the older compiler is only partly, not entirely protective.

@Thaddy: The security solution I am using is Avast 8 WHS edition. What do you think of this? Will do a rootkit check.

@Cyrax: I also share your suspicion of Flexraid. I will try Sync as you recommended, and depending on the result, will contact the Flexraid developer.
How did you conclude that Flexraid does not flush its caches? Did you try it yourself?

Cyrax

2015-05-02 15:41

reporter   ~0083397

@Ulrich Thome : Nope. I didn't try it myself because I don't have necessary equipment at hand. Also I want keep my systems tidy as possible.

The conclusion which I came, was made from your findings that the corrupted log files did have extra data from files which were edited previously. Thus there is high change that OS file cache and Flexraid ones is somehow conflicting each other.

Ulrich Thome

2015-05-08 23:27

reporter   ~0083529

Just a brief note: Schedluing a Sync before the test program did not help. The error still occurs every time the log file in appended from 444 to 666 bytes length.

Ulrich Thome

2015-05-10 21:45

reporter   ~0083557

Hi,
I have recompiled the test program with fpc 3.1.1 Same problem. Corrupted log files.
I am now going to redirect the write to a drive that is not governed by flexraid. Will report back tomorrow!

Marco van de Voort

2015-05-31 22:15

manager   ~0084144

Last chance at feedback before closing after 2 1/2 months with umpteen comments and still no reproducable behaviour.

Thaddy de Koning

2015-06-01 08:00

reporter   ~0084147

Last edited: 2015-06-01 08:01

View 2 revisions

But a difference in behavior between 2.4 and 2.6.4 and higher regarding to something as important as file IO is of genuine concern.

Even if hard to reproduce. I've seen worse in my career.

I agree in so far that I was able to spend some time on it, with a similar configuration (software raid, NAS) and was unable to reproduce.

Ulrich Thome

2015-06-20 18:23

reporter   ~0084591

Hi, I am back

I have spent the time with more testing, and found that the problem only occurs when the write is directed to the pool drive (v:\) created by Flexraid's traid software. The v:\ drive is made up by pooling the e:\ and f:\ drives. When the write is directed to a real drive, such as e:\, the error does not occur. When the write is directed to the v:\ pooled drive, the error is easily reproduced. This is interesting because the written data goes to the same physical position, because the e:\ is part of the pool.

With this finding, I have contacted the programmer of Flexraid. Here is what he wrote:

"I think your issues might be related to how you wrote your program.
When you allocate memory for a buffer, it contains garbage data. All operating systems require that write buffers be 512 bytes aligned. So, when writing only 222 bytes, the question now becomes in what way are you doing so?
Basically, you simply CANNOT write 222 bytes to a file. What you need to do is read 512 bytes from the file into a buffer, write the 222 bytes into the buffer, and then write the buffer to the file.
Your test program and its flaw basically sent you to a different course than what your original issue is."

Do you agree with him? Is the test program flawed by only writing 222 bytes? Do I have to create a buffer in my Pascal programs, read the file, add the data, and write the whole thing back to the disk? I always assumed the compiler would take care of such issues. Unfortunately, I have no idea how the write operation is internally handled inside the fpc compiled programs.

Furthermore, the programmer's response does not acknowledge the fact that the problem is clealy related only to the pool drive.

Please give me your comments!

Max Nazhalov

2015-06-20 20:11

reporter   ~0084596

Last edited: 2015-06-20 20:31

View 4 revisions

You never need to read-modify-write sequence for files, such things are entirely covered by OS [core/drivers/cache]. Even in MS-DOS etc..

>> "Basically, you simply CANNOT write 222 bytes to a file. What you need to do is read 512 bytes from the file into a buffer, write the 222 bytes into the buffer, and then write the buffer to the file."

Surely You CAN. OS always does this sequence for you, if it ever needed.
This sentence only confirms a bug in a lower level (possibly, driver that handles "pooling").

Neither the compiler, nor RTL does care of what the file-alignment is for the data writing to a file -- it is entirely out of their scope.

[Edit: "All operating systems require that write buffers be 512 bytes aligned." I'm sad.. ;-) Indeed, it is an OS job to make sure its write buffers are properly aligned, but not all way around.]

Marco van de Voort

2015-06-20 21:08

manager   ~0084598

Last edited: 2015-06-20 21:19

View 2 revisions

I agree with Max. If what he says is true, it is a requirement for interaction between OS and the flexraid block devicedriver, not applications and flexraid.

Applications are not even supposed to be aware of the blocksize

Anyway, now it is pinpointed to flexraid, I think we can close the ticket, since there is nothing to change on the FPC side.

Marco van de Voort

2015-06-20 21:20

manager   ~0084599

Close for now.

Ulrich Thome

2015-07-04 00:07

reporter   ~0084857

I have done some more testing and found that if the flush command is inserted before the first write, the error does not occur. I would be really grateful if you told me whether the flush command is actually mandatory (the Flexraid developer insisted that it's my programmig (or Bart's who created the test program), rather than a bug in Flextaid).

Here is the code:

procedure schreiben(logdatei:string);
var
  ablage : TextFile;
begin
  assignfile(ablage,logdatei);
  try
    if FileExists(logdatei) then
    begin
      //reset(ablage);
      //closefile(ablage);
      append(ablage)
    end
    else
    begin
      rewrite(ablage);
      writeln(ablage, Header);
    end;
    flush(ablage); // this flush prevents the described error;
    write(ablage, Pad(datum));
    write(ablage, Pad(tag));
    write(ablage, Pad(zeit));
    write(ablage, Pad(id));
    write(ablage, Pad(s1pm));
    write(ablage, Pad(s1ps));
    write(ablage, Pad(s2pm));
    write(ablage, Pad(s2ps));
    write(ablage, Pad(acpm));
    write(ablage, Pad(acps));
    write(ablage, Pad(dctemp));
    write(ablage, Pad(actemp));
    write(ablage, Pad(entag));
    write(ablage, Pad('?'));
    write(ablage, Pad(enwoch));
    write(ablage, Pad('?'));
    write(ablage, Pad(enmon));
    write(ablage, Pad('?'));
    write(ablage, Pad(enjahr));
    write(ablage, Pad('?'));
    write(ablage, Pad(enges));
    write(ablage, Pad('?')); //22 items
    writeln(ablage);
    closefile(ablage);

Jonas Maebe

2015-07-04 12:56

manager   ~0084859

The flush is not mandatory. It causes the current contents internal file buffer (internal in the FPC RTL) to be written to disk, instead of this happening once that buffer is full.

You can try running Process Monitor (https://technet.microsoft.com/en-us/sysinternals/bb896645.aspx ), setting it to filter only operations performed by your program, to figure out what the difference is at the system level between the two versions of the program. I've never used it (I don't even have Windows), but it's recommended at http://stackoverflow.com/questions/3847745/systrace-for-windows (there are some other tools you can try as well, such as the drstrace tool of Dr. Memory)

Marco van de Voort

2015-07-06 16:06

manager   ~0084874

Let the Flexraid developer name the MSDN articles for the relevant functions that FPC calls.

I don't see any such requirements in e.g. writefile (https://msdn.microsoft.com/en-us/library/windows/desktop/aa365747%28v=vs.85%29.aspx).

Thaddy de Koning

2015-07-06 19:48

reporter   ~0084876

Last edited: 2015-07-06 19:51

View 2 revisions

Also, I might add that part of the "info" above, the bogus information that is stored, comes from a very different computer than the one you use yourself: your son's one.
Since the raid software is supposed to take care of any contention between writes from two or more different locations, I suggest that the most likely issue is simply a wrong sub-allocator before anything is physically written to any disk. The sub-allocator is part of the driver software for the raid, not Freepascal's.
It seems - or better, you demonstrated to us - the blocks can be dirty, contaminated if you wish, from/by different locations. That should never be possible in a raid (any level).

[edit]

I tested a similar configuration like yours with the example code.
I used the Linux kernel raid features with standard linux Debian64 and could not reproduce it, as I reported before.

Marco van de Voort

2015-09-27 15:34

manager   ~0086106

No answer on substantiation of Flexraid claims -> close.

Issue History

Date Modified Username Field Change
2015-03-14 16:46 Ulrich Thome New Issue
2015-03-14 16:46 Ulrich Thome File Added: bug_report_files.ZIP
2015-03-14 17:04 Thaddy de Koning Note Added: 0081955
2015-03-14 17:23 Thaddy de Koning Note Edited: 0081955 View Revisions
2015-03-14 17:24 Thaddy de Koning Note Edited: 0081955 View Revisions
2015-03-14 17:26 Thaddy de Koning Note Edited: 0081955 View Revisions
2015-03-14 17:27 Thaddy de Koning Note Edited: 0081955 View Revisions
2015-03-14 17:28 Thaddy de Koning Note Edited: 0081955 View Revisions
2015-03-14 18:22 Marco van de Voort Note Added: 0081958
2015-03-14 18:51 Bart Broersma Note Added: 0081961
2015-03-14 18:54 Bart Broersma Note Edited: 0081961 View Revisions
2015-03-14 21:39 Ulrich Thome Note Added: 0081968
2015-03-14 21:43 Ulrich Thome Note Edited: 0081968 View Revisions
2015-03-14 21:47 Ulrich Thome Note Edited: 0081968 View Revisions
2015-03-15 12:46 Bart Broersma Note Added: 0081984
2015-03-18 22:23 Ulrich Thome Note Added: 0082100
2015-03-21 15:54 Ulrich Thome File Added: bug_report_files_2.ZIP
2015-03-21 16:14 Ulrich Thome Note Added: 0082164
2015-03-29 22:19 Ulrich Thome Note Added: 0082447
2015-03-29 22:45 Mark Morgan Lloyd Note Added: 0082449
2015-03-30 14:54 Marco van de Voort Note Added: 0082464
2015-03-30 17:22 Thaddy de Koning Note Added: 0082471
2015-03-30 17:24 Thaddy de Koning Note Edited: 0082471 View Revisions
2015-03-30 17:25 Thaddy de Koning Note Edited: 0082471 View Revisions
2015-04-01 22:47 Ulrich Thome Note Added: 0082539
2015-04-02 02:08 Cyrax Note Added: 0082545
2015-04-04 16:04 Bart Broersma Note Added: 0082586
2015-04-04 23:30 Bart Broersma File Added: append-test.zip
2015-04-04 23:42 Bart Broersma Note Added: 0082598
2015-04-04 23:42 Bart Broersma Note Edited: 0082598 View Revisions
2015-04-04 23:43 Bart Broersma Note Edited: 0082598 View Revisions
2015-04-04 23:44 Bart Broersma Note Edited: 0082598 View Revisions
2015-04-04 23:44 Bart Broersma Note Edited: 0082598 View Revisions
2015-04-05 17:54 Ulrich Thome File Added: fehler150405.zip
2015-04-05 18:04 Ulrich Thome Note Added: 0082630
2015-04-05 18:36 Ulrich Thome Note Added: 0082635
2015-04-05 23:40 Bart Broersma File Added: append-test2.zip
2015-04-05 23:56 Bart Broersma Note Added: 0082644
2015-04-05 23:57 Bart Broersma Note Edited: 0082644 View Revisions
2015-04-06 15:21 Ulrich Thome Note Added: 0082661
2015-04-06 15:22 Ulrich Thome File Added: appenddefs.pp
2015-04-06 15:23 Ulrich Thome Note Edited: 0082661 View Revisions
2015-04-06 15:26 Ulrich Thome Note Edited: 0082661 View Revisions
2015-04-06 15:29 Ulrich Thome Note Edited: 0082661 View Revisions
2015-04-06 15:29 Ulrich Thome Note Edited: 0082661 View Revisions
2015-04-06 15:44 Jonas Maebe File Deleted: append-test.zip
2015-04-06 18:57 Bart Broersma Note Added: 0082676
2015-04-07 23:29 Ulrich Thome File Added: testfiles150407.zip
2015-04-07 23:43 Ulrich Thome Note Added: 0082705
2015-04-08 01:53 jamie philbrook Note Added: 0082707
2015-04-08 12:36 Jeppe Johansen Note Added: 0082711
2015-04-08 14:09 Thaddy de Koning Note Added: 0082712
2015-04-08 14:11 Thaddy de Koning Note Edited: 0082712 View Revisions
2015-04-08 15:14 Thaddy de Koning Note Edited: 0082712 View Revisions
2015-04-08 21:39 Ulrich Thome Note Added: 0082722
2015-04-08 21:40 Ulrich Thome Note Edited: 0082722 View Revisions
2015-04-08 22:12 Thaddy de Koning Note Added: 0082724
2015-04-08 22:13 Thaddy de Koning Note Edited: 0082724 View Revisions
2015-04-12 22:17 Ulrich Thome File Added: testdatei150412.txt
2015-04-12 22:38 Ulrich Thome Note Added: 0082854
2015-04-12 22:54 Cyrax Note Added: 0082855
2015-04-13 00:04 Mark Morgan Lloyd Note Added: 0082863
2015-04-14 22:26 Ulrich Thome File Added: testdatei150414.txt
2015-04-14 22:27 Ulrich Thome File Added: soldatei1.txt
2015-04-14 22:27 Ulrich Thome File Added: stromdat.txt
2015-04-14 22:42 Ulrich Thome Note Added: 0082905
2015-04-15 03:59 Cyrax Note Added: 0082921
2015-04-15 04:17 Cyrax Note Edited: 0082921 View Revisions
2015-04-16 23:07 Ulrich Thome File Added: testdatei150416.txt
2015-04-16 23:16 Ulrich Thome Note Added: 0082981
2015-04-16 23:29 Cyrax Note Added: 0082982
2015-04-17 08:42 Stephano Note Added: 0082988
2015-04-17 14:03 Thaddy de Koning Note Added: 0083000
2015-04-17 14:07 Thaddy de Koning Note Edited: 0083000 View Revisions
2015-04-17 14:15 Thaddy de Koning Note Edited: 0083000 View Revisions
2015-05-02 13:57 Ulrich Thome Note Added: 0083393
2015-05-02 15:41 Cyrax Note Added: 0083397
2015-05-08 23:27 Ulrich Thome Note Added: 0083529
2015-05-10 21:45 Ulrich Thome Note Added: 0083557
2015-05-31 22:15 Marco van de Voort Note Added: 0084144
2015-05-31 22:15 Marco van de Voort Assigned To => Marco van de Voort
2015-05-31 22:15 Marco van de Voort Status new => feedback
2015-06-01 08:00 Thaddy de Koning Note Added: 0084147
2015-06-01 08:01 Thaddy de Koning Note Edited: 0084147 View Revisions
2015-06-20 18:23 Ulrich Thome Note Added: 0084591
2015-06-20 18:23 Ulrich Thome Status feedback => assigned
2015-06-20 20:11 Max Nazhalov Note Added: 0084596
2015-06-20 20:17 Max Nazhalov Note Edited: 0084596 View Revisions
2015-06-20 20:28 Max Nazhalov Note Edited: 0084596 View Revisions
2015-06-20 20:31 Max Nazhalov Note Edited: 0084596 View Revisions
2015-06-20 21:08 Marco van de Voort Note Added: 0084598
2015-06-20 21:19 Marco van de Voort Note Edited: 0084598 View Revisions
2015-06-20 21:20 Marco van de Voort Note Added: 0084599
2015-06-20 21:20 Marco van de Voort Status assigned => resolved
2015-06-20 21:20 Marco van de Voort Resolution open => no change required
2015-07-04 00:07 Ulrich Thome Note Added: 0084857
2015-07-04 00:07 Ulrich Thome Status resolved => feedback
2015-07-04 00:07 Ulrich Thome Resolution no change required => reopened
2015-07-04 12:56 Jonas Maebe Note Added: 0084859
2015-07-06 16:06 Marco van de Voort Note Added: 0084874
2015-07-06 19:48 Thaddy de Koning Note Added: 0084876
2015-07-06 19:51 Thaddy de Koning Note Edited: 0084876 View Revisions
2015-09-27 15:34 Marco van de Voort Note Added: 0086106
2015-09-27 15:34 Marco van de Voort Status feedback => resolved
2015-09-27 15:34 Marco van de Voort Resolution reopened => unable to reproduce