Why 8kb memory allocation initially for a txt in Win

well give this a try :
create a txt file and write nothing onto it and see on propeties it should read:
size :0 bytes
size on disc: 0 bytes
okie now write just one charecter onto it and again see itz properties
size :1 bytes
size on disc: 8192 bytes
okie now if you consider 1 charecter = 1 byte and go on riting till 8193 bytes or charecters then don't you see the size on disc change to a larger limit i.e. 16384bytes and this is uniform in notepad txt format!
okie for those with a 16384byte start on size on disc it should get multiplied with 32Kb and itz equivalent bytes....which stands though linear as far as math is conserned but then explzin this initial allotment of excess disc space..further if you try opening a word doc and try this thing of checking this properties with and without any charecters entered the alloted space on disc is NOT UNIFORM {FIRSTLY IN DOC 1CHARECTER NOT=1BYTE AS I SEE IT ON MY COMP SECONDLY IT STARTS WITH 16Kb OF ON DISC SIZE THIRDLY A SINGLE CHARACTER JUMPS IN A HUGE BYTE SIZE}


so my big question is on this non-uniformity and again you cud xperiment the same with bmp ppt pdf and other files...

Replies

  • xheavenlyx
    xheavenlyx
    You know, reading your post was a little difficult, but got the idea. but you know what...I like the criosity, since Im like that too 😀

    Ok, Creating a new file (test.txt) = 0 B, 0B on disk.

    Create new file + one char = 1 B, 4 KB on disk. 😀

    ......till......

    I ran some tests by using MS-DOS debug:

    I created a new file text.txt with this written: 'blah bl' That is 7 bytes, space included!

    Running debug in DOS.

    D:\>debug test.txt
    -r
    AX=0000  BX=0000  [B]CX=0007[/B]  DX=0000  SP=FFEE  BP=0000  SI=0000  DI=0000
    DS=137C  ES=137C  SS=137C  CS=137C  IP=0100   NV UP EI PL NZ NA PO NC
    137C:0100 37            AAA
    -
    Here look at the CX register, = 0007 which is actual space used by the file in memory. i.e 7 bytes!

    Now I will dump the contents of the file test.txt. You can see blah bl written.

    -d
    137C:0100  62 6C 61 68 20 62 6C 00-00 00 00 00 00 00 00 00   blah bl.........
    137C:0110  00 00 00 00 00 00 00 00-00 00 00 00 34 00 6B 13   ............4.k.
    137C:0120  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
    137C:0130  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
    137C:0140  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
    137C:0150  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
    137C:0160  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
    137C:0170  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
    
    Jus avoid the 4 k written there, I still am trying to figue that out. No, its not the 4 k like in 4KB, i had over written it, nothing happened! See below.

    Here i will create a file with exactly 4096 B (4 KB) in debug and save it.

    -f 100 1000 32
    -d
    137C:0100  32 32 32 32 32 32 32 32-32 32 32 32 32 32 32 32   2222222222222222
    137C:0110  32 32 32 32 32 32 32 32-32 32 32 32 32 32 32 32   2222222222222222
    137C:0120  32 32 32 32 32 32 32 32-32 32 32 32 32 32 32 32   2222222222222222
    137C:0130  32 32 32 32 32 32 32 32-32 32 32 32 32 32 32 32   2222222222222222
    137C:0140  32 32 32 32 32 32 32 32-32 32 32 32 32 32 32 32   2222222222222222
    137C:0150  32 32 32 32 32 32 32 32-32 32 32 32 32 32 32 32   2222222222222222
    137C:0160  32 32 32 32 32 32 32 32-32 32 32 32 32 32 32 32   2222222222222222
    137C:0170  32 32 32 32 32 32 32 32-32 32 32 32 32 32 32 32   2222222222222222
    -n c:\test2.txt
    -w
    -rcx
    CX 02FB
    :1000
    -n c:\test2.txt
    -w
    Writing 01000 bytes
    -
    Here, ' Writing 01000 bytes' is in HEX which means 4096 in decimal. Ok, we have created a file of exactly 4 KB 😀. Now, when I checked the properties it said,

    4KB size, 4KB size on disk. No change!
    Well, let me add one more byte to the file to make it 4097 bytes.

    After doing that the properties:

    Size:           4.00 KB (4,097 bytes)
    
    Size on Disk:         8.00 KB (8,192 bytes)
    Observation:

    When you cross the allocated file size (I think this is a standard in NTFS), then it allocates the next block, 4 more KB! So, windows actually goes on increasing 4KB after it crosses the next limit... 4KB, 8KB, 12KB etc.

    THIS WAS FOR .TXT or TEXT FILES!

    IED Kid, you were right in observing, diff comps have different allocation methods, I think this might depend on windows file system used like NTFS or FAT32 etc.
  • xheavenlyx
    xheavenlyx
    *** Before I start, the following and previous informaion assumes you are familier with HEX dump and how to read a hex dump, Hopefully this is helpful. ***

    Ok, about .DOC files, Its not easy to find out a proper way on how wondows allocates the memory, every version i think has a different method since MS Word is not TEXT in a way of saying, it contains format data, header info, size, signaures etc etc...a lot of garbage, thats why its usually bigger than .txt files.

    But considering JPG, or BMP, it actually depends on the file, there are a few types of JPEG and BMP. Heres my test on a BMP:

    2 x 2 pixel BMP is 70 bytes, of course it contains much more info than just the pixel info. The image i drew in Paint has first pixel black ,remaining white. The Debug:

    D:\>debug test.bmp
    -r
    AX=0000  BX=0000  CX=0046  DX=0000  SP=FFEE  BP=0000  SI=0000  DI=0000
    DS=137C  ES=137C  SS=137C  CS=137C  IP=0100   NV UP EI PL NZ NA PO NC
    137C:0100 42            INC     DX
    -
    Notice CX = 0046 hex meaning 70 in dec, thats the size.

    The HEX dump information of this file:

    -d 100 146
    137C:0100  42 4D 46 00 00 00 00 00-00 00 36 00 00 00 28 00   BMF.......6...(.
    137C:0110  00 00 02 00 00 00 02 00-00 00 01 00 18 00 00 00   ................
    137C:0120  00 00 10 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
    137C:0130  00 00 00 00 00 00 FF FF-FF FF FF FF 00 00 00 00   ................
    137C:0140  00 FF FF FF 00 00                               ......
    -
    To decode the above here is the header information from #-Link-Snipped-#

    The first two bytes have to be 424D.

    the 3rd byte is size : 46 HEX = 70 Bytes!

    The 10th byte is offset to start the 'image' i.e 36, notice 136 starts FF FF... etc which is the image.

    BIT 14 size of BITMAPINFOHEADER structure, must be 40 or 28.

    BIT 18 image width in pixels (here 02).

    BIT 22 image height in pixels (here 02).

    BIT 26 number of planes in the image, must be 1.
    (BIT numbers are given in dec, convert them to hex to follow the above debug hex dump)

    So here is the very basics of the BMP header and why its more that several bytes.
  • Ashraf HZ
    Ashraf HZ
    Great analysis there, xheavenlyx!
  • th3 ied kid
    th3 ied kid
    hi xheavenly
    thnx a lot u've done loads of analysis ,and
    now a couple of analysis of mine on doc has absolutely shocked me!i wrote a couple of counted characters on doc and i copied the same onto another NEW doc i created and i ended up having two doc with different name,same content yet WITH DIFFERENT MEMORY SIZE !! okie i wud have to agree
    i can't plot a clear rookie analysis of MS word!
    well again thnx for that analysis on BMP and JPEG files too!

You are reading an archived discussion.

Related Posts

Hai, I am Sandeep sivan. I am doing my 7th sem B-tech in EEE. I am intrested in doing a useful project.Please help me by giving an idea regarding what...
Hello frenz, Am back after a long time...waz busy these daz! am starting a new thread for all the CAT aspirants...Post all ur problems & we'll try to solve them...
plz suggest me appliction project ideas as my final year project and some good semian topics in computer science.
Hi friends... I m dng my prefinal year in IT I m supposed to do a mini project n dis sem... I thot of dng an implementation algorithm n encryption.....
i am new to this site , i am computer science student i want problem to analyze