Storing Images in the Database

With small and relatively simple applications it is fairly common practice to keep any image files separate from the database. This makes accessing the images very slightly easier and means that the images take up less space. It also means that you need to use code in your application to maintain the relationships between the image and the rest of the data rather than letting the database look after the relationships for you.

Once an application starts to become more complicated the benefits of actually storing the images in the database along with the rest of the data start to grow and the disadvantages of storing the images in the database become less important in comparison with the advantages.

You don't need the situation to be all that complex in order for the benefits of storing the image inside the database to become evident. Let's consider some simple situations that become horribly complex when you move the images out of the database.

Case 1. A transaction you are running includes the deletion of three images. With everything in the database including the images it is simply a matter of running all the queries and if any query fails then you do a ROLLBACK. If all the queries including the ones to delete the images succeed you run a COMMIT. Consider the same with the images outside of the database and the delete of the third image fails requiring that the transaction be rolled back. How do you handle undeleting the other two images?

Case 2. Two transactions trying to run at the same time. One will delete an image if nothing else references it and the other is going to add a reference to that image. With everything inside the database the first of these transactions will place a lock on the data it references which will prevent the second transaction starting until the first finishes. So if the transaction to do the delete runs first then the image is deleted and the second transaction rolls back because the image it is trying to reference doesn't exist. If the transaction to add the reference runs first then the other doesn't delete the image because there is another reference to it. With the image outside of the database there may be nothing that the two queries have in common within the database to actually prevent them both running together and so one deletes the image at the same time as the other adds a reference to it resulting in a corrupted database.

Case 3. You have an image that should only be accessible to people who are logged in. With the image inside the database you simply include a test for them being logged in when you write the WHERE clause for the query to generate the image that is generated out of the database. If the image is outside of the database then the image needs to be stored somewhere that cannot be directly accessed so as to require a script to access it. That script then needs to perform the same test to check if they are logged in as the first script would and only supply the image when that condition is true. A script to display the image is needed for both of these alternatives and both need to do a database lookup. The separate file version needs extra code to actually read the content of the image into the script in order to process it making the code needed to handle a separate image more complex than the code to display it directly from the database. This is before you start considering what extra relationships you may need to test in order to maintain the integrity of the data where you need extra code elsewhere to make sure that the database never refers to images that don't exist and that images get deleted when the database no longer references them.

These examples are fairly simple ones that can easily occur even in relatively small applications. Consider how many more complex examples like these would apply if you have a medium to large application involving dozens or hundreds of tables, terabytes of data and millions of images.

While the benefits of keeping the images outside of the database may sometimes be the dominant factors I do not believe that they are as dominant as some people appear to think. Those who argue that the images should always be kept separate are arguing from ignorance. Ask them how they propose to handle the above simple situations when they arise. I'd be particularly interested in hearing how they will handle undeleting an image. Most of the situations where it is better to keep the images separate would be in the smallest 5% of all the applications that exist and mostly concentrated at the tiny end of that range. To be able to process images in any moderately sized application you almost have no choice but to include them in the database just in order to avoid the millions of lines of extra code you would need to maintain the data integrity if you put them outside.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow