Deep Learning

briannemsick/barrage — Barrage is an opinionated supervised deep lea...

🍂 This pet project is in recognition of the upcoming 40th Anniversary of Star Trek the Motion Picture (TMP), which I thought could also be a geeky experiment to test the evolving deep learning technology.  One of the criticisms often levelled against TMP is that it is heavy on special effects and lacks some of the camaraderie of the original series (TOS).  I wondered if there were any little tweaks that could be done to certain scenes using Deepfake or Deep Video Portrait technology to nudge the mood slightly. I settled my focus on scenes involving the actress Grace Lee Whitney [].  She is a personal favourite of mine, having been the female lead in season one of TOS who was fired, allegedly after being sexually assaulted by an executive, and who was asked to reprise her role in the franchise for TMP.  Her appearance is little more than a cameo with a couple of lines of dialogue, only one of which is delivered to the camera.  However, she also gets a cameo in Star Trek III which carried quite a bit of emotional resonance, and most of her appearance is in close up and therefore potentially useful for Deep Portrait technology.  So this is a proposal to test the possibility of Star Trek the Motion Picture: the Rand Cut. ********** Rand's final scene in TMP is where Dr McCoy beams aboard, [].  This scene is normally shortened in most edits because some of the dialogue makes light of an earlier disaster in which two crewmen die horribly. The flippant tone and inane grins are inappropriate considering the awful tragedy that has just taken place.  I wondered if the full scene could benefit from Deepfake / Deep Video Portrait to make any / all of the following changes: 1. Make Kirk and Rand look more serious until after McCoy arrives: A yeoman beams aboard and tells Kirk that McCoy, "Insisted we go first to see how it scrambled our molecules," which leads Kirk and Rand to smile.  If the smirking is replaced with serious faces and Kirk delivers his "That has a familiar ring to it," line with a straight face, it adds both a double meaning to his comment (it acknowledges the seriousness of the actual disaster at the same time as McCoy's fan-favoured reluctance to trust the transporter) as well as being consistent with Kirk's long-established gallows humour. NB: The earlier scene where the disaster takes place has lengthy close ups of Kirk and Rand looking very serious, which could work for Deep Video Portrait adjustment [] 2. Insert a line of dialogue for Janice Rand: In the fan-created episode, World Enough and Time (WEAT) [], Rand says to Sulu, "Don't look so worried."  Inserting this line into TMP after the Yeoman speaks but before Kirk's line could also work on multiple levels if delivered with a more serious face (reassuring Kirk that the problem that caused the earlier disaster has now been fixed, a show of empathy alluding to Rand's previous close working relationship with Kirk, and an acknowledgement that Kirk is nervous about his reunion with McCoy).  NB: WEAT was filmed many years after TMP when Grace was in her seventies so it seems that Deep Video Portrait might be needed to help make the younger actress's mouth move appropriately in the TMP footage? NB: Although the line is brief, it's possible that there is not quite enough of a pause in between the yeoman's and Kirk's lines to fit in the extra dialogue.  Another possibility is to insert it off camera, just after McCoy arrives (I don't think it works quite as well there as it feels like she should say, "Don't look so worried, Dr McCoy," which would require extra splicing from TOS dialogue from either Charlie X or the Corbomite Manoeuvre (where she mentions McCoy by name) and splicing her 70 year old voice with her 36 year old voice might sound odd, albeit still possible with more sound editing effort and less Deep Video Portrait effort).  A third option is to insert it into the brief shot where she is smiling after McCoy's arrival just before Kirk steps out of the booth (my concern here is that, again, it seems to make light of the earlier disaster). Grace only appears in the first third of TMP and it seems a shame that she vanishes so early on after such a long absence from the franchise due to an indemic problem only recently given public recognition by the Me Too movement. The second, more ambitious, fantasy edit would be to digitally add the actress into some later scenes. One possibility is to insert a couple of close-ups into the very long V'Ger flyover.  Rand is not established to be on the bridge at this point but there are a lot of open-mouthed stares from other characters in this scene so I'm sure another few from Rand would not be out of place. The main problem here is finding a suitable clear background shot on which to paste her image. Another option is to insert a brief rear shot from the Rec Deck scene into a scene where Kirk is being dressed in a space suit to pursue Spock []. The problem here I think is that the scene might be a bit too drawn out to use the brief rear shot from the earlier scene effectively. Not wanting to be overly ambitious, I picked a scene in the latter part of the movie where Kirk and McCoy are watching Decker and the Ilia probe on a viewscreen in Kirk's quarters.  Earlier in the movie, McCoy chews out Kirk and leaves these same quarters.  As McCoy heads out the door, he becomes small enough to be obscured if a close up of Rand were to be digitally added in the foreground. So for this more complex edit I wondered if it would be possible to do the following : 1. Paste a close up Rand into a static image of Kirk's quarters. After the establishing shot of Kirk and McCoy watching the viewscreen, cut to the new view of the quarters, angled facing the door, taken from the earlier scene. Paste a digital copy of Rand in the foreground (to obscure McCoy's original exit) and establish her presence in the room. 2. Insert a line of dialogue for Janice Rand. Use Deep Video Portrait to allow her to speak a line of dialogue from the TOS episode the Balance of Terror, "Can I get you something from the galley, sir?" It's trivial but she used to be his yeoman with knowledge that he forgets to eat in a crisis (the Corbomite Manoeuvre), and she also has a history of spending her personal time delivering food to her friends who are working overtime (the Man Trap). The original line in Balance of Terror was delivered in close up so it might be fairly easy to port across onto an older TMP version of the actress. 3. Insert a line of dialogue for Kirk. Insert an off camera, "No thank you," "Thank you," or some such from Kirk.  This would allow the focus to stay on the close up of Rand to minimise any complicated cuts to the original footage. If it's possible for her to show facial recognition of his reply (pursed lips or a slight nod perhaps) it might add to the realism of the exchange. 4. Insert a line of dialogue for Janice Rand. After a slight pause, use Deep Video Portrait to allow her to speak a line of dialogue from the TOS episode Miri, "Do you suppose she knows?" 5. Insert a line of dialogue for Dr McCoy. Cut back to the original footage but add the Dr McCoy line from the TOS episode, Spock's Brain, "Jim, she may not remember, or even really know," and then let the remainder of the scene take place as normal. 6. Add a reaction shot of Rand. Although not strictly needed, you could add a second, brief reaction shot of Rand after Kirk thumps the table to show her own shock or disappointment when Decker's line of questioning (viewed by them on the screen) fails. As a final edit I would love Rand to be featured walking onto the bridge or shown as present on the bridge in the final scene of TMP.  This would mark the only moment, apart from the final scene in Star Trek IV when all nine original main characters are present in the same scene.  Unfortunately,  I don't think there is a suitable shot anywhere else in the movie that could work here. So there are some ideas of mine for 'the Rand Cut'.  Does anyone have any thoughts or suitable ideas for code that could make some or all of these scenes as viable for insertion into a movie Edit?
🍂 Image Classifier Going forward, AI algorithms will be incorporated into more and more everyday applications. For example, you might want to include an image classifier in a smartphone app. To do this, you'd use a deep learning model trained on hundreds of thousands of images as part of the overall application architecture. A large part of software development in the future will be using these types of models as common parts of applications. In this project, you'll train an image classifier to recognize different species of flowers. You can imagine using something like this in a phone app that tells you the name of the flower your camera is looking at. In practice, you'd train this classifier, then export it for use in your application. We'll be using this dataset of 102 flower categories. When you've completed this project, you'll have an application that can be trained on any set of labelled images. Here your network will be learning about flowers and end up as a command line application. But, what you do with your new skills depends on your imagination and effort in building a dataset. This is the final Project of the Udacity AI with Python Nanodegree Prerequisites The Code is written in Python 3.6.5 . If you don't have Python installed you can find it here. If you are using a lower version of Python you can upgrade using the pip package, ensuring you have the latest version of pip. To install pip run in the command Line python -m ensurepip -- default-pip to upgrade it python -m pip install -- upgrade pip setuptools wheel to upgrade Python pip install python -- upgrade Additional Packages that are required are: Numpy, Pandas, MatplotLib, Pytorch, PIL and json. You can donwload them using pip pip install numpy pandas matplotlib pil or conda conda install numpy pandas matplotlib pil In order to intall Pytorch head over to the Pytorch site select your specs and follow the instructions given. Viewing the Jyputer Notebook In order to better view and work on the jupyter Notebook I encourage you to use nbviewer . You can simply copy and paste the link to this website and you will be able to edit it without any problem. Alternatively you can clone the repository using git clone then in the command Line type, after you have downloaded jupyter notebook type jupyter notebook locate the notebook and run it. Command Line Application Train a new network on a data set with Basic Usage : python data_directory Prints out current epoch, training loss, validation loss, and validation accuracy as the netowrk trains Options: Set direcotry to save checkpoints: python data_dor --save_dir save_directory Choose arcitecture (alexnet, densenet121 or vgg16 available): pytnon data_dir --arch "vgg16" Set hyperparameters: python data_dir --learning_rate 0.001 --hidden_layer1 120 --epochs 20 Use GPU for training: python data_dir --gpu gpu Predict flower name from an image with along with the probability of that name. That is you'll pass in a single image /path/to/image and return the flower name and class probability Basic usage: python /path/to/image checkpoint Options: Return top K most likely classes: python input checkpoint ---top_k 3 Use a mapping of categories to real names: python input checkpoint --category_names cat_To_name.json Use GPU for inference: python input checkpoint --gpu Json file In order for the network to print out the name of the flower a .json file is required. If you aren't familiar with json you can find information here. By using a .json file the data can be sorted into folders with numbers and those numbers will correspond to specific names specified in the .json file. Data and the json file The data used specifically for this assignemnt are a flower database are not provided in the repository as it's larger than what github allows. Nevertheless, feel free to create your own databases and train the model on them to use with your own projects. The structure of your data should be the following: The data need to comprised of 3 folders, test, train and validate. Generally the proportions should be 70% training 10% validate and 20% test. Inside the train, test and validate folders there should be folders bearing a specific number which corresponds to a specific category, clarified in the json file. For example if we have the image a.jpj and it is a rose it could be in a path like this /test/5/a.jpg and json file would be like this {...5:"rose",...}. Make sure to include a lot of photos of your catagories (more than 10) with different angles and different lighting conditions in order for the network to generalize better. GPU As the network makes use of a sophisticated deep convolutional neural network the training process is impossible to be done by a common laptop. In order to train your models to your local machine you have three options Cuda -- If you have an NVIDIA GPU then you can install CUDA from here. With Cuda you will be able to train your model however the process will still be time consuming Cloud Services -- There are many paid cloud services that let you train your models like AWS or Google Cloud Coogle Colab -- Google Colab gives you free access to a tesla K80 GPU for 12 hours at a time. Once 12 hours have ellapsed you can just reload and continue! The only limitation is that you have to upload the data to Google Drive and if the dataset is massive you may run out of space. However, once a model is trained then a normal CPU can be used for the file and you will have an answer within some seconds. Hyperparameters As you can see you have a wide selection of hyperparameters available and you can get even more by making small modifications to the code. Thus it may seem overly complicated to choose the right ones especially if the training needs at least 15 minutes to be completed. So here are some hints: By increasing the number of epochs the accuracy of the network on the training set gets better and better however be careful because if you pick a large number of epochs the network won't generalize well, that is to say it will have high accuracy on the training image and low accuracy on the test images. Eg: training for 12 epochs training accuracy: 85% Test accuracy: 82%. Training for 30 epochs training accuracy 95% test accuracy 50%. A big learning rate guarantees that the network will converge fast to a small error but it will constantly overshot A small learning rate guarantees that the network will reach greater accuracies but the learning process will take longer Densenet121 works best for images but the training process takes significantly longer than alexnet or vgg16 *My settings were lr=0.001, dropoup=0.5, epochs= 15 and my test accuracy was 86% with densenet121 as my feature extraction model. Pre-Trained Network The checkpoint.pth file contains the information of a network trained to recognise 102 different species of flowers. I has been trained with specific hyperparameters thus if you don't set them right the network will fail. In order to have a prediction for an image located in the path /path/to/image using my pretrained model you can simply type python /path/to/image checkpoint.pth Contributing Please read for the process for submitting pull requests. Authors Shanmukha Mudigonda - Initial work Udacity - Final Project of the AI with Python Nanodegree