Before we we start I would like to make it clear that this is not meant as a tutorial, and is simply an explanation to why and how I did things in this little weekend project as a method of practice documenting my work. That said, if it is in any way useful to you as an example project for something similar, I am happy to answer any questions you may have.
This entire article and the images included in it are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The example project included alongside this project is licensed under a GNU General Public License version 3 or later.
The full source for this project can be found on GitHub.
If you haven't heard of it, livelife.green is a site I started with a few friends intended to write articles on, and enable discussion about the Climate Crisis. An important feature of this site are the forums, and although our visitor numbers are increasing, user interaction remains non existent.
One board on these forums, livelife.green/community/news is dedicated to discussion on current affairs regarding eco-related topics. As a small weekend project I decided to create a 'bot' that checks news sites for articles about the climate crisis or other related things and posts them on the news board. Now creating a bot to do these things is easy as pie, and a script could probably be written in fifteen minutes, but rather than having the 'bot' post links to articles straight onto the board I wanted to be able to curate its findings so I can manually filter out topics that aren't closely related to subjects of interest, stub articles that simply contain a sentence or two and don't actually say anything of value, or for any other reason I deem them inappropriate. In theory this is very simple, but it did turn a tiny project into something a lot more extensive. This gave me the following requirements for this project:
1 - It must scan news sites of my choosing for potentially relevant news articles,
2 - It must present them to me in a simple way for curating,
3 - It must post links to approved articles on livelife.green.
Before jumping into writing code for this project I had to work out the best way for my bot to post topics on the forums. The first option was to create something that could post topics on the front end of the site. livelife.green runs Joomla as its CMS, and within that runs the Kunena component to provide forums, so my second option would be to create something that could create a topic from within Joomla. The third option was to post topics directly into the site's database. Deciding on the third option for ease I started experimenting with creating topics from within the database. Now unfortunately creating a topic from the database on Kunena isn't as simple as just dropping the topic into the 'topics' table, as without the correct requirements being met topics don't show up at all on the front end. Fair enough considering it was never designed to make posting outside of the front end easy. After some playing around I found the following lengthy procedure to be a successful way of creating topics…
Procedure for Posting
1) Collect the following values
a) Our thread topic, for example "My Test Topic"
b) Our thread text, for example "This is the text in my post...",
c) The ID of the board we want to post on, found in the table kunena_categories,
d) The ID of the user we want to post as, found in the table kunena_users,
e) The name of the user we want to post as, found in the table, kunena_users.
2) Post the topic, with values a, b and c from above;
"INSERT INTO kunena_topics (params, posts, rating, category_id, first_post_userid, last_post_userid, first_post_guest_name, last_post_guest_name, first_post_time, last_post_time, subject, first_post_message, last_post_message) VALUES (\"\", 0, 0, c, d, d, e, e, UNIX_TIMESTAMP(), UNIX_TIMESTAMP(), a, b, b);"
3) Record the ID of the topic just created, we will refer to this as 'f' in this example.
4) Post the topic in kunena_messages, using the values from above;
"INSERT INTO kunena_messages (catid, userid, name, time, thread, subject) VALUES (c, d, e, UNIX_TIMESTAMP(), f, a);"
5) Record the ID of the row we just inserted into kunena_messages, we will refer to this as 'g'.
6) Post out message in kunena_messages_text;
"INSERT INTO kunena_messages_text (message, mesid) VALUES (a, g);"
7) Update the row we inserted into kunena_topics in step 2 to provide the required post IDs;
"UPDATE in5op_kunena_topics SET first_post_id=g, last_post_id=g WHERE id=f;"
8) Increment the post count and update the last post IDs in the category table;
"UPDATE in5op_kunena_categories SET numTopics=numTopics + 1, numPosts=numPosts + 1, last_topic_id=f, last_post_id=g WHERE id=c;"
Next I wrote a small script to search for potentially relevant news articles. I decided to write this script in Python as it has simple libraries for just about anything you can think of, and thanks to the main news sites still using RSS feeds, gathering articles together couldn't be easier.
The example on this image searches the RSS feeds defined in 'sources' for any articles that include a word or phrase defined in Words of Interest ('wois'), then prints the link in question.
When executed this script will show something like this...
Great, so at this point we know how to post the topic from within the database, and we know how we are going to gather news articles. Now we just need to work out how we are going to curate the results from our pyhton script before they are posted on the forums.
To manage the results I decided I was going to create an Android app, just because I wanted the experience creating something in Android Studio, rather than just using a webview or working in NDK, both of which I have already spent much more time with. For the script to share its results to the Android App I also needed to create a web service. In this case I felt the quickest thing to do would be to post the Python scripts results to my database then provide them to the app via a PHP script. Once things are approved via the app and PHP script it is then up to the Python script to check the 'approved' posts list and create the corresponding topics on the livelife.green forums. Now I could have just had the app scan the news sites for results, but I wanted to make it a server side script to ensure it is ran regularly and I don't miss anything. Additionally the PHP script could have posted thing straight to the forums, but I decided it might be best to have a small time window in which I could go back and reject results after I have approved them if I make some error or just decide I don't want to include them.
So at this point we have 3 different programs. The Python script that searches for news articles, puts them on a database, then checks the database for approved results and posts those to the forums. The PHP script that provides results from the database to our mobile app and updates the status of articles in the database upon request of the app. Finally the Java Android app that displays results to a moderator to be approved or rejected. I have retrospectively created a flow chart for main functions of each of these scripts just to make things nice and clear.
The Python Script (FLOWCHART [svg][png][dia])
The top portion of our python script is dedicated to configuration. The sources lists RSS feeds to monitor along with the name of the feed provider that we want to use as the prefix for the topic name when we post it to the forums. Initially I was just using the designated values for this from the RSS feeds but it proved to provide a little more messy than I wanted. This way seemed like the most elegant option.
‘wois’ lists words of interest. These are the words or phrases that the script will use to decide if the article should be put put on a list for review. In this example we have a few words I initially chose for testing. Something worth noting is that the first value, ‘eco’ was a poor choice of words in this list, as it is commonly used in words like ‘become’, ‘second’ and ‘recognise’. Thus begins the battle for finding words and phrases that are both quite general in the topic of things you want, but not too general that they start selecting a large amount of unrelated articles.
The next part of the script has our database info, and the information posting for the bot, such as my Joomla database prefix, my bot’s username and id and the category ID of the board I want to post things to on Kunena forums. Although this script is very specific and its unlikely that anybody would want to download it and use it as is, I felt it worth storing them at the top here anyway just so that I could keep my values secret without any hassle when pushing the script up to GitHub.
After this small section of configuration we jump into the first large part of the script, the part that checks for new articles and adds them to the database. This part of the scripts is pretty self explanatory. For each feed entry we go though each article title iterating though each word of interest. If the word of interest is in the article we check if the link for the article has already been popped onto our article list database, If it hasn’t then we add it.
The final part of the script then gets a list of articles from the database where the status is ‘3’, or ‘posted’. We then post them to the forum’s database one by one using the ‘Procedure for Posting’ we listed above.
This python script is executed every 3 hours by a cron job.
Before writing things I came up with a quick list of status and action codes that could be common across the scripts. These codes are as follows:
Log status codes:
2 - New to DB
3 - Approved
4 - Rejected
5 - Posted
Articles list status codes:
2 - Not reviewed
3 - Approved
4 - Rejected
5 - Posted
0 - Check for unreviewed urls
1 - Test Login
2 - Get unreviewed items
3 - Get approved items
4 - Get rejected items (last 10)
5 - Get posted items (last 10)
8 - Approve Item
9 - Reject Item
The PHP Script (FLOWCHART [svg][png][dia])
The PHP script has a settings file, ‘llgNewsBotSettings.php’ we keep in a ‘local’ directory that cannot be accessed externally via the web server containing the our database information.
The main PHP script is ‘llgNewsBotProvider.php’ that can be found in our ‘www’ directory.
This script is called from the mobile app and simply checks that the user has provided valid login details in a json format. Once the login information has been confirmed it then does something depending on the ‘requestType’. The request types are defined in our ‘codes’ list above.
Request type 0 is reserved for if I wanted to implement mobile notifications when there are new articles on the list later on. At the current time I deemed in unnecessary as I don’t want more notifications and, somewhat sadly, there are pretty much guaranteed to be a few articles on the climate crisis each day, so if I check once a day I can just sort them then. If I started to forget or articles because infrequent then I would add notifications.
Request type 1, “Test Login", is used by the app to check if the authentication is successful when a user logs in with a new username or password or when the app loads.
Request types 2, 3, 4 and 5 are used to request a list of articles from the articleList database table by status, each one corresponding to their requested article status. Requests for articles with the status of 4, “Rejected" and 5, “Posted" are limited to 10 items.
Request types 8 and 9 are used to update the status of an article in the articleList table. Type 8 updates the status of the item defined by the ‘id’ json field to “Approved", and type 9 updtheates them to “Rejected".
The Java mobile app
globals.java (FLOWCHART [svg][png][dia])
The first class I have is the ‘globals’. This class contains some static variables and functions for creating json objects containing our authentication details, a function for posting our requests to the php script, a function to display alert dialogue and finally the asynchronous class that tests for a successful login that is used in the SplashActivity and LoginActivity.
SplashActivity.java (FLOWCHART [svg][png][dia])
This is the first activity that loads whenever you open the app. It simply loads the LoginActivity if no password has been saved or calls the ‘testLogin’ from our globals class, switching to the login activity if there are any problems, or switching to the MainActivity in the case of a successful login.
LoginActivity.java (FLOWCHART [svg][png][dia])
This activity has our login forms on the layout. When it loads it shows any warnings that haven’t been displayed left over from the SplashActibity. When the user presses the “login" button it calls the ‘testLogin’ functions, loading the MainActivity on success, or showing a warning containing the reason for not being successful otherwise.
MainActivity.java (FLOWCHART [svg][png][dia])
This class is the real guts of the app. The layout for it has a list view and a button. The list is used to display lists of articles and the menu, keeping things very simple.
When the activity opens it attempts to load a list of “unreviewed" articles. Articles in the list will be displayed with their ID in the news bot’s database, the title of the article and the URL of the article. If you click on one of these items it will clear the list, then show the article you just clicked on the top and displaying the options for the article below. Clicking on an ‘unreviewed’ article will bring up the options ‘Approve’ or ‘Reject’. Articles with an ‘Approved’ status will show the option ‘Reject’. ‘Rejected’ articles will show just show the ‘Approve’ option. ‘Posted’ articles will not show any options.
Once in this view, showing the clicked on item at the top, clicking the item again will open the URL in the phone’s web browser. Clicking on the options will set the articles status to that which you chose.
Pressing the menu button lists our menu options in the list view that we use for everything else, from here you can select ‘Unreviewed’, ‘Approved’, ‘Rejected’ and ‘Posted’, each bringing up a list of articles with the status you selected. Finally there is a logout option, that clears the saved username and password then switches to the login activity.