From 8d7d02f42c3947f756c18cb4d37d9d97fbd0d27d Mon Sep 17 00:00:00 2001 From: Franck Cuny Date: Wed, 10 Aug 2016 14:33:04 -0700 Subject: convert back to md --- posts/2014-11-20-opening-a-file.org | 176 ------------------------------------ 1 file changed, 176 deletions(-) delete mode 100644 posts/2014-11-20-opening-a-file.org (limited to 'posts/2014-11-20-opening-a-file.org') diff --git a/posts/2014-11-20-opening-a-file.org b/posts/2014-11-20-opening-a-file.org deleted file mode 100644 index 63e4f6a..0000000 --- a/posts/2014-11-20-opening-a-file.org +++ /dev/null @@ -1,176 +0,0 @@ -A very common task for a programmer is to open a file. This seems to be -a trivial operation, and we don't think twice about it. But what is -really happening when we're opening that file ? - -** A simple C program - -For this exercise, I'm going to use this very simple C program: - -#+BEGIN_HTML - -#+END_HTML - -The code does the following things: - -- opens a file in read-only mode -- checks that we got a file descriptor -- if we don't have the file descriptor, we print an error and exit -- we close the file descriptor -- we exit - -This is really simple and not much is going on, right ? Let's take a -better look at it. - -The =fopen()= function that we use is provided by the libc. It's -documentation is pretty straight forward (=man 3 fopen=): /"The fopen() -function opens the file whose name is the string pointed to by path and -associates a stream with it."/. - -** Run the program - -We're going to compile the source code first, so we can run the program: - -#+BEGIN_EXAMPLE - gcc -o test test.c -#+END_EXAMPLE - -** Overview - -First I want to have an overview of the execution of this program. For -this we will use =strace=. - -#+BEGIN_HTML - -#+END_HTML - -We can ignore most of that output, only the last few lines interest us. -We can see two functions related to the code we wrote: - -- a call to open, with */etc/issue* as the first argument -- a call to close, again, with 3 as the first argument - -The first function is the system call =open()=, and we see that it -returns 3, which is our file descriptor. When =close()= is called, it's -only argument is again 3, which is the file descriptor returned by -=open()=, and then we exit. - -** Deeper - -Now let's invoke the program with gdb: - -#+BEGIN_HTML - -#+END_HTML - -We can see the calls (the =callq= instructions) to our three functions: -=fopen()=, =perror()= and =fclose()=, but we want to take a look at what -exactly is behind this functions. Let's try to dig the =fopen= -instruction a little bit more (I've removed all the lines that are not -the =callq= instructions): - -#+BEGIN_HTML - -#+END_HTML - -OK, so here we can see that we're calling the function -=_IO_new_file_fopen()=. - -** libc - -In our program, we're using functions provided by the libc. We're going -to take a look at =_IO_new_file_fopen=, and we can read the source -[[http://fxr.watson.org/fxr/source/libio/fileops.c?v=GLIBC27#L252][here]]. - -Most of the function is to set a bunch of flags, and then the next call -we're interested in is -[[http://fxr.watson.org/fxr/source/libio/fileops.c?v=GLIBC27#L335][=_IO_file_open=]]. -The function is defined -[[http://fxr.watson.org/fxr/source/libio/fileops.c?v=GLIBC27#L217][here]]. -As you can see, here we end up calling =open()=. - -** system call - -The =open()= function is one of the linux system calls. If we look at -[[http://lxr.free-electrons.com/source/include/linux/syscalls.h][the -list of syscalls]], we can see that it is mapped to -[[http://lxr.free-electrons.com/source/include/linux/syscalls.h#L512][=sys_open=]]. - -The function is defined in -[[http://lxr.free-electrons.com/source/fs/open.c#L992][fs/open.c]], and -do a call to -[[http://lxr.free-electrons.com/source/fs/open.c#L964][do\_sys\_open]]. - -The interesting part of the function starts with the call to -=get_unused_fd_flags()=, where we get a file descriptor. Then we do the -call to =do_filp_open()=, where we end up (via more functions call): - -- geetting a file struct -- find the inode -- populate the file struct - -To finish, we do a call to =fsnotify()=, which will notify the watchers -on this file, and add the file descriptor with the other struct files. - -** inodes - -To open a file, you need to locate it on the disk. A file is associated -with an inode, which contains meta data about your file, and they are -stored on your disk. When you want to reach a file, the kernel will find -the inode and from that the location on the disk. You can read more -about inodes on [[https://en.wikipedia.org/wiki/Inode][wikipedia]], and -this [[https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout][great -page about ext4]]. - -You can run =man 1 stat= in your shell on the file to see the -information we can find. - -#+BEGIN_HTML - -#+END_HTML - -An inode is a data structure to represent an object on the filesystem. -If you look at the previous output, you can see information like the -size, the number of blocks, how many references exists to this file -(links), etc. - -Here, we can see that the inode is 679618. Now let's take a look with -the FS debugger: - -#+BEGIN_HTML - -#+END_HTML - -There's many cools things you can do with inode, like using =man 1 find= -to find a file based on it's inode instead of file name. - -** Deeper! - -Valgrind is another amazing tool to do analysis of a program. Let's -recompile our binary with the =-g= option, to embed debugging -information in our binary: - -#+BEGIN_EXAMPLE - gcc -g -o test test.c -#+END_EXAMPLE - -=valgrind= has an option =--tool= to use specific tool. Let's run -valgrind with the *callgrind* tool, followed by =callgrind_annotate= to -get a more readable output: - -#+BEGIN_HTML - -#+END_HTML - -With the =--cache-sim=yes= option, we count all the instructions for -read access, cache misses, etc. Another nifty tool is *cachegrind*, -which shows the cache misses for different level of caches. - -#+BEGIN_HTML - -#+END_HTML - -** The end - -As you can see, using various tools (and there's more tools available!), -you can see that opening a file involves a lot of operations behind the -scene. -- cgit v1.2.3