Here lies the open source version of Saiy for Android, dependent on Google Play Services, which demonstrates how a Virtual Assistant functions, from start to finish.
Saiy is a many times rebuilt version of its previous incarnation as utter! Countless attempts and therefore experience getting such an application to function on Android, has brought me to a point where I feel it's time to open source the code, for many reasons.
After spending a few years rewriting the code base, I think I have finally got it to a stage where it could be considered 'scalable'. That may not be the correct terminology, given the infinite possibilities of natural language requests and the finite amount of actions that could be coded to resolve them - Nevertheless, having adapted the application to integrate numerous APIs for Text to Speech, Speech to Text, Natural Language Understanding, Machine Learning, Cognitive Services (such as emotion analysis and vocal identification) (in the least possible spaghetti way I could achieve) and written my own APIs for developers to integrate their applications, functions and services, it's a case of now or never to open source it; whilst the implementations to these connected services and APIs are functional and up-to-date.
The project itself is too large and the possibilities it presents are too great for just a lone developer and it would be great to see what the community can do with it - iterating at a pace that no other similar application can keep up with.
Additionally, most of us find AI and the future of our virtual assistants and smart tech pretty fascinating, but to get involved requires a number of technical stepping stones. I hope by publishing this code and the ease of which commands can be created and adapted, using either simple String matching, a cloud based solution or your own NLP implementation, it will allow many to dive straight in and therefore further their interest.
The project is licensed under the GNU Affero General Public License V3. This is a copyleft license. See LICENSE
I have selected this license over any other, in order to ensure that any adaptations or improvements to the code base, require to be published under the same license. This will protect any hard work from being adapted into closed sourced projects, without giving back.
The license grant is not for Saiy's trademarks, which include the logo designs. Saiy reserves all trademark and copyright rights in and to all Saiy trademarks.
Copyright © 2017 Saiy® Ltd.
I need to clarify the most appropriate for the GNU Affero General Public License - will revisit very soon. Any suggestions welcome.
Network & Embedded Text to Speech
See Saiy TTS project for further integration examples.
Network & Native Speech to Text
Natural Language Processing
The project is built using Java 7 - Android SDK (API 26) - Android NDK
Using Android Studio, it can be imported as a new project via version control or the downloadable zip.
Without stating the obvious, when testing on a physical device, the performance of the code is accentuated by the hardware specifications - more so than your average app, as there is a lot going on.
Installing the Google Text to Speech Engine on your test device is recommended, due to the features it provides.
To use free embedded and offline Voice Recognition, install Google's 'Now' application. If you have a Samsung device, their Vlingo recognition service does not work correctly for external applications.
Please use the Stack Overflow tag for compiling related questions and errors.
For code issues and crashes, please open an issue.
For discussion, please use the XDA development thread for now.
In all major areas of the code, I will attempt to add further README files to detail a more specific explanation - including TO-DOs, issues and required improvements. Check the subdirectories of the code to see if a README is present, or a placeholder, letting you know that one should be soon.
Briefly, there are two major classes in the app, that direct and distribute work elsewhere:
SelfAware is the main Foreground Service, responsible for managing the application state and channelling voice recognition, text to speech and other API requests.
Quantum is the main processing class, where commands are locally resolved (if required), sensibility checked and actioned.
Understanding the above two classes is essential to following the flow of the full application logic.
For the sake of testing ease, the code points to static API keys and secrets held in the configuration directory. It should probably go without saying, don't do this in production code.
This dates back to my original code written as a beginner for utter! That said, it is only relatively recently that cloud services such as API.ai or embedded options became available/usable as well as manageable for an individual developer.
Up until this point, Java libraries that attempted the equivalent, bloated and lagged the app to the point of stalling. They were not an option.
I am also mindful, that I would like developers of any experience to be able to contribute to and manipulate the code with ease. Basic String operations can be converted by others over at SaiyeyMcSaiyface
I use the word, scalability, with caution. Whatever strides machine learning has taken up until now, there is still a requirement for a human to hard-code the ultimate action that is resolved to be performed. Whether this be turning up your heating, or the generic layout design of a weather request, someone still needs to write that code and the surrounding error handling.
Whilst standardised templates, to organise the world of information around us, can assist to categorise the output to a set of static response mechanisms - we perhaps can't feasibly use the word scalable, until a machine can dynamically write code (or the equivalent) for itself.
The above is for another discussion, but the point to take is that development is currently consigned to the following:
Much of the above will need to be hard-coded. I state this only to manage expectations of currently how 'smart' we can hope to be...
Initially, I have published only the core of the application, so it may be critiqued in terms of its structure and quality of code. Much of the fundamental construct of the app and the code style/quality used, is repeated across the 500K+ lines still to be pushed.
Once there is general consensus on the application core, I will begin to upload the remaining code, with any suggested alterations already in place.
I am entirely self taught in Java, so go easy on me!
The code was originally written in Java 8, but had to be reverted due to build issues with the Jack compiler
Now Jack is deprecated, I plan to revisit this soon.