I finally have the first completely working prototype of the fill-database program, sort of…
This past week was spent trying to get the diff-comments to properly work, tidying up the code & solving an indentation problem. There was some difficulty getting the diff-comments to work since I used the UploadDiff function I didn’t have a direct handle on any diff (fileDiff) itself, I would only get back a diffSet object, something very different than a fileDiff. However, there is something called a Foreign Key that allows the recursive reference of one object through a many-to-one relationship. I read about the Foreign Key before, but I didn’t realize its full potential. Since there is a foreign key in the FileDiff object to the DiffSet I can access the appropriate FileDiff object through a DiffSet object. So by accessing diffset.files I’m able to get a hold of the fileDiff objects that I needed. The only clincher is accessing this data, since this type of access returns a Manager Object you can’t simply access the diffSet.files.etc, you can only retrieve the data through a limited number of Manager Object functions. So in order to solve this issue I created a loop that simply exits on the first iteration, which will give the first and appropriate fileDiff object (the remaining objects in the manager are variations of the first one). The only bad thing about this is that it looks like a pretty bad cheat, but it works quite well until a better solution can be found. I did a bunch of looking and its the only solution I could find.
Since then I resolved the indentation problem using some subfunctions, tidied up the code, and started testing. Simple tests succeeded fine but anything with a significant bit of data (users=100, reviews=10, diffs=10, …) took many hours to run & I would run out of memory. Christian mentioned that django uses transactions for the database interaction and that I’d need to flush the transactions every so often. So I did a bit of reading here and discovered that there are several ways to have a transaction commit itself, which others noted sped up their performance quite a bit. Defining an autocommit at the start of the function made a great deal of difference, taking seconds to do 10 diff-comments rather than minutes. Unfortunately there are still memory errors that are only detectable by running large tests for a number of hours. After much more reading I’ve found what some have termed ‘django memory leaking’. I suspect its actually the django queries that are being stored since I can’t seem to find any information on something like a transaction cache, but I’ll have to run some long-running tests to see for sure.
So in the mean time I’ve written up some test cases and creating a bunch of large diff files that are needed for testing purposes. Then, at last, it will be time to post it on the review-board & get some much needed feedback! I look forward to the review, it will be nice to make sure this program is up-to-snuff & do a final commit! I suspect there will be a number of changes needed so I’m not going to be surprised if I see a lot of comments on review-board.