I've already written a little about what my WPF-application looks like. But I'm going to finish off here since I did some changes to it. By the way: A good reason to write your own is that if you use the "the playground", it will cost you. There are limits I found out, after testing it. You have a certain amount of credits that runs out really fast if you're "messing around". If you run an application using their API, it will cost you less. From what I experienced yesterday.
The DALL-E API can be divided into three parts:
- Image generation, based on prompt. Generates an image based on your prompt. Options: number of images and size
- Image variations, based on uploaded image. Creates variations of the image that you choose. Options: number of images and size
- Image edit, based on prompt, original image, and image mask. Options: mask. (Not sure how this works without a mask)
My solution has one parent class called "Dalle". I was thinking that I'd make a namespace called Dalle, but since my original code from the first draft had a certain hierarchy, this was not preferable. The Dalle-class has in general 4 child-classes:
- requestImage, serializes to a json-string
- requestImageVariation, posts multipart form-data
- requestImageEdit, posts multipart form-data
- responseImage, imageData, receives data from openai-endpoints defined by 1, 2, 3
I made some other classes too: options and create temporary window for images. I know that some of these solutions are not the best, but it works. So it might be a little messy, since I don't have time to plan properly(not making time). I'm learning as I go one might say. The application was written in the same order as the list above: (1), (2) and (3). (4) has always existed since it's a must. All images are opened in temporary floating windows which can be saved or closed. They are detached from the main application. I was actually thinking that another solution, where the images are in a shared placeholder attached to the main window, would be better, but I'm trying to keep it small. Or maybe all images should should be placed in the same window as icons. We\ll see what the future brings. For the moment I have to prioritize because of lack of time. No one is paying me to do this
(4) is a common placeholder for all classes sending a request and receiving data. It's very simple actually. See below:
the imageData class receives a url where you can download the data from
Before getting images one must set the right values. I've made a class OptionsImage which are connected to the applications controls, so that all values will be updated
public class OptionsImage
{
public int noImages;
public string csize;
public string requestURL;
//public string responseFormat;
public readonly int[] optImages = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
public readonly string[] optSize = { "256x256", "512x512", "1024x1024" };
public OptionsImage()
{
//set default values
noImages = 1;
requestURL = url_image_generations;
csize = optSize[0];
// responseFormat = "url";
}
}
optImages contains values that are set in a combobox and the same goes for optSize. responseFormat is set to url by default. That's what I wanted. Easier I think.
All request-classes(requestImage, requestImageVariations and requestImageEdit) has a public "async Task-post-data"-function. Since the data is different from each different request there had to be three different functions(from my point of view). But they are structured the same way. See below:
public async Task<HttpResponseMessage> PostFile(string key)
{
using (var httpClient = new HttpClient())
{
httpClient.DefaultRequestHeaders.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", key);
httpClient.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
var jsonString = JsonSerializer.Serialize<Dalle.requestImage>(rxImages);
var content = new StringContent(jsonString, Encoding.UTF8, "application/json");
var res = await httpClient.PostAsync(url_image_generations, content);
return res;
}
}
It looks like I'm not handling any error-messages from the end-point, but it's done in the main application. The "PostFile"-function returns the <HttpResponseMessage> and is handled accordingly there. See below:
var responseString = await GlobalhttpResponse.Content.ReadAsStringAsync();
Dalle.resource = JsonSerializer.Deserialize<Dalle.responseImage>(responseString);
/// there's an error. just get out.
if(Dalle.resource.data == null)
{
txtDalleResponse.Text = "Server-response: \n" + GlobalhttpResponse + "\n\nError:\n" + responseString;
this.IsEnabled = true;
return;
}
An example of this is if I'm sending a request which OpenAi.com does not accept. "Picture of Donald Trump". This creates an error. See below:
Server-response:
StatusCode: 400, ReasonPhrase: 'Bad Request', Version: 1.1, Content: System.Net.Http.HttpConnectionResponseContent, Headers:
{
Date: Fri, 07 Apr 2023 07:18:33 GMT
Connection: keep-alive
Access-Control-Allow-Origin: *
openai-version: 2020-10-01
openai-organization: user-vnadedmwziyalatk8ehvvp0w
X-Request-ID: 95ae77fb5b7f3beb0af025c65518a1b4
openai-processing-ms: 47
Strict-Transport-Security: max-age=15724800; includeSubDomains
CF-Cache-Status: DYNAMIC
Server: cloudflare
CF-RAY: 7b407f4adbebb4f9-OSL
Alt-Svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400
Content-Type: application/json
Content-Length: 243
}
Error:
{
"error": {
"code": null,
"message": "Your request was rejected as a result of our safety system. Your prompt may contain text that is not allowed by our safety system.",
"param": null,
"type": "invalid_request_error"
}
}
This handles server-side errors. But there are other types of errors like "timeout"(which happens from time to time), so the above code is put in a "try-catch"-clause. The catch-part looks like this:
catch(Exception err)
{
this.Dispatcher.Invoke(() =>
{
txtDalleResponse.Text = err.Message + "\nInnerexception: " + err.InnerException;
this.IsEnabled = true;
return;
});
}
I'm not using a lot of time handling errors, but at least I know that there IS an error, and I can see what it's about.
Here's an example fo an error message concerning timeout
The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
InnerException: System.TimeoutException: The operation was canceled.
---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter)
at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer)
at System.Net.Http.HttpConnection.InitialFillAsync(Boolean async)
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
--- End of inner exception stack trace ---
I'm not going to write so much about the image generations and the image variations. They were quite straight forward. Bur for here I probably could of done something instead of waiting a 100 seconds for an answer. Not now.
Dall-E - image edit
First of all I want to say that all the images I was trying to edit had the size 256x256. I tried multiple/many times to make a prompt and erase areas of the images to make it work. But I did not feel like that it was doing anything. But as I was trying to find "others" I tried the application in "the playground". It used 1024x1024 pictures. That's when things started to move.
First of all: To make it more clear as to which images I wanted to edit I mad a "preview-area", below the options panel. I did not have that in the beginning, but found out that it was very practical. Se pic below:
The upper image is the original image. The lower image is the mask. To make a mask, use paintshop, affinity or something similar to make parts of the image transparent.
My prompt was "the cat has different colored eyes". And there you have it: It worked! See below:
I'm not sure, but I think DALL-E wants 1024x1024 sized images as input. And this image is the only one that I can say that was succesfull. It does'nt really work the way I expect. The cat-picture worked as I expected; but then on the next image I want to try it either does nothing, or it's just plain "bad". It seems like it does not understand what you want. Here's another test, with a cat-dog:
I've removed the background, and the prompt is as follows: "Dog and a cat. They are now in a meadow looking ta something that catches their eyes". Result is the following image:
At least the background changed. It looks more like a backyard and not a meadow. But hey. Who am I to judge...
And then it works again. Prompt "a cat and a dog sitting on a mountain, where you can see mountains and a blue sky in the background". Result:
Not sure what that orangey/brown blob is, but there is a mountain and a blue sky. I can see that . And then I tried with other animals: "a lot of pigeons". There's three. It's more than a couple . With my app I can also change the size and number of generated picture. Same prompt except that I have n=3, and size="256x256".
Ok. That's it for now. I'm finished with cats and dogs and pigeons. But I must say: It's a lot of fun. The cat is scary. It's missing a pupil on the left eye. One could probably edit that with a mask