Setting up ROUGE

ROUGE or Recall-Oriented Understudy for Gisting Evaluation is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing.The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.

While working on my Project on Summarisation, in order to compare my generated summary against the reference summary I had to set up ROUGE. In contrast to how there’s a lot of content on every topic available on the internet, well, installing ROUGE only has a few articles. None of them provided the sequence of steps and the requirements to install it. I wished there was some blog written. The content was present but I had to piece it all together from multiple sources to set it up. But, let me make things easy for the future researchers here in this post.

Steps

1. Download ROUGE-1.5.5 from here You only need the ROUGE-1.5.5 directory.

git clone https://github.com/andersjo/pyrouge.git
cd pyrouge/tools/ROUGE-1.5.5

2. Check if Perl is installed else install it.

In Ubuntu, do

sudo apt-get install perl

3. For installing XML:DOM(this is a requirement for ROUGE to work) we install synaptic package manager

sudo apt-get update
sudo apt-get install synaptic

4. Once Synaptic Package manager is installed, search for Synaptic package manager in your applications and launch it.

Screenshot from 2018-08-04 21-29-44

Once the package manager is opened search “libxml-dom-perl” Screenshot from 2018-08-04 21-38-44

Click on Mark for Installation and apply changes.

Screenshot from 2018-08-04 22-20-37

5.  An environment variable ROUGE_EVAL_HOME must be set to point to the data directory.

export ROUGE_EVAL_HOME="/home/poojitha/pyrouge/tools/ROUGE-1.5.5/data/"

6. To avoid any WordNet exceptions, run these commands.

cd data/WordNet-2.0-Exceptions/
./buildExeptionDB.pl . exc WordNet-2.0.exc.db

cd ../
ln -s WordNet-2.0-Exceptions/WordNet-2.0.exc.db WordNet-2.0.exc.db

ROUGE is now installed, hurray!

Setting up pyrouge

pyrouge is a Python wrapper for the ROUGE summarization evaluation package. Getting ROUGE to work can require quite a bit of time. pyrouge is designed to make getting ROUGE scores easier by automatically converting your summaries into a format ROUGE understands, and automatically generating the ROUGE configuration file.

As of now, pypi version of pyrouge is deprecated, so let’s get the latest version from the repository

https://github.com/bheinzerling/pyrouge.git
cd pyrouge

Set the ROUGE path with the command

pyrouge_set_rouge_path /home/poojitha/pyrouge/tools/ROUGE-1.5.5/

(pyrouge_set_rouge_path /absolute/path/to/ROUGE-1.5.5/)

Install pyrouge using

sudo python setup.py install

Test if everything’s installed by running

python pyrouge-tests.py

If the above command outputs “OK” , everything has been installed properly.

 

References:

  1. https://en.wikipedia.org/wiki/ROUGE_(metric)
  2. http://kavita-ganesan.com/rouge-howto/
  3. https://stackoverflow.com/a/28941840/8800466
  4. https://github.com/bheinzerling/pyrouge

 

Advertisements

SSH Hacks

Jupyter Notebooks

The Jupyter Notebook App is a server-client application that allows editing and running notebook documents via a web browser.

You’re starting to experiment an unknown library and you write a big chunk of code, there are too many errors, so you start debugging but in the end you give up hope. Sad story, eh?

But what if you can write small chunks, check if it’s running fine and proceed to the next? That’d be so great. This is what Notebooks achieve and why I love it.

Remote port forwarding / Reverse ssh tunneling for jupyter notebooks

My laptop’s too slow to even run a hello world program and I try to run it on my friend’s powerful machine. I want to run not just any code, but a jupyter notebook.  This is how I do it.

I ssh into my friend’s machine first. Now, the

  • Current machine is my friend’s powerful machine – let it be C,
  • Remote machine is the local machine, my very slow laptop  – R

Start a jupyter notebook on C in a new session, using

jupyter-notebook --no-browser --port 8080

In another session on C, do this

ssh -N -f -R <portR>:localhost:<portC>  <user_name>@<local_machine_ip(R's ip)>

<portR> is R’s port we wish to use.

<portC> is C’s port that’s currently used, in this case it’s 8080

<user_name>@<local_machine_ip> is my slow laptop’s address.(R)

Now, I will be able to access the jupyter notebook on my laptop(R) and perform expensive operations using C’s resources.

Using bind address 

Syntax for the argument -R from the man pages of ssh

-R [bind_address:]port:host:hostport

My friend now wants to access my jupyter notebook on my machine and when I send the address <R's ip>:portR , my friend wouldn’t able to access it.

To allow nonlocal users to be able to connect R:portR through localhost:portC, follow these few steps.

R$ grep GatewayPorts /etc/ssh/sshd_config
#GatewayPorts no

In the R’s /etc/ssh/sshd_config file  add

GatewayPorts clientspecified

Restart sshd using

R$ sudo service sshd restart

and run in C,

ssh -N -f -R 0.0.0.0:<portR>:localhost:<portC>  <user_name>@<local_machine_ip>

or

ssh -N -f -R \*:<portR>:localhost:<portC>  <user_name>@<local_machine_ip>

or

ssh -N -f -R "[::]:<portR>:localhost:<portC>"  <user_name>@<local_
machine_ip>

If you do this very often, set up a special host in ~/.ssh/config on C:

 Host laptop
 HostName <R's ip>
 User <user_name>
 RemoteForward portR localhost:portC

Arguments

  • -N  says that you want an SSH connection, but you don’t actually want to run any remote commands. If all you’re creating is a tunnel, then including this option saves resources.
  • -R  Specifies that the given port on the remote (server) host is to be forwarded to the given host and port on the local side.
  • -f  Requests ssh to go to background just before command execution

References

  1. http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html
  2. http://linuxcommand.org/lc3_man_pages/ssh1.html
  3. https://unix.stackexchange.com/a/87459
  4. https://unix.stackexchange.com/questions/162093/reverse-ssh-tunnel-in-config

Some hacks

Share buttons

1. Sharing buttons – default.

&lt;script type="text/javascript" src="http://w.sharethis.com/button/buttons.js"> 
</script>
<p> Share on </p>
<span class='st_facebook_large' displayText='Facebook'></span>
<span class='st_twitter_large' displayText='Tweet'></span>
<span class='st_googleplus_large' displayText='Google +'></span>
<span class='st_linkedin_large' displayText='LinkedIn'></span>

2. Sharing custom text/image/url

Use properties like

st_url Specifies URL (can be shortened URL) that you would like shared
st_title Specifies title that you would like shared
st_image Specifies link to image you would like displayed in the shared content
st_summary Specifies summary text/description you wish to share

The above code would change into something like this,

&lt;script type="text/javascript" src="http://w.sharethis.com/button/buttons.js"></script>
<p> Share on </p>
<span class='st_facebook_large' displayText='Facebook'></span>
<span class='st_twitter_large' displayText='Tweet' st_title="This is custom text #Blog #tweet @systers_org"></span>
<span class='st_googleplus_large' displayText='Google +'></span>
<span class='st_linkedin_large' displayText='LinkedIn'></span>

3. Sharing custom text dynamically. (Django)

&lt;script type="text/javascript" src="http://w.sharethis.com/button/buttons.js"></script>
<p> Share on </p>
<span class='st_facebook_large' displayText='Facebook'></span>
<span class='st_twitter_large' displayText='Tweet' st_title="{{ share_message }}"></span>
<span class='st_googleplus_large' displayText='Google +'></span>
<span class='st_linkedin_large' displayText='LinkedIn'></span>

In the corresponding view add share message into the context, looks something like this.

def get_context_data(self, **kwargs):
     context = super(ExampleView, self).get_context_data(**kwargs)
     context['share_message'] = self.object.title " @systers_org " 
     return context

 

References : Custom Buttons

Journey with Systers so far…

It’s been a few weeks since I have started my internship, and there’s a lot to tell.

Community bonding period flew by quickly as I was having my end semester exams. Having finished my exams, and it being the start of my winter break, I had some free time before the official internship period begins.During that period, I was doubting myself if I could finish what I proposed and if I could make portal all set to serve the world. I wanted to get rid of all the anxiety and assert myself that I could do it, so I started the work a few days before the starting date. I was able to finish some tasks with less difficulty, Yay!

After that, I was eagerly waiting for the internship to begin so I can submit my first of many pull requests 😛 Working before official starting date made me less nervous and more confident about myself. Doubts washed away. Whoosh!

I did make a few resolutions on my first day. Some of them were – working sincerely throughout, writing code neater than what I usually code in my personal uni projects, write blogs, involve more with the community and learn more as the internship progresses.

Good Code

 

So, as an intern, I’d have to attend meetings. Meetings with May, mentors, fellow outreachy interns were so much fun. They weren’t what I thought they would be like, they’re so coool. There were meetings where we discussed the features on portal, setting up timeline – that was related to work, and there were meetings where it’s not always about work – there was a game we played that was to guess others tastes( to know more about my fellow outreachy interns), like talking about random stuff everyone’s interested in.

Coding is so much fun too, if only there were no errors XD. But, the truth is everyone’s bound to get errors, and get stuck once in a while. No matter how small the error can be and get stuck, it doesn’t mean that we’re stupid to not be able to solve it.

I didn’t know Django before, but, hey, now I do.  I knew I am good at python, MVC architectures, and MySQL, so I knew I would eventually be good at Django too. Most of the things I learned (listed under), I did know them(forms/views/migrations), it’s just that I’ve never ever coded them in Django. I’m a little confident in Django now, but not before I solved some of these tasks.

 

  1. The first task, Adding a  new community form for admin [1]  – I learned how to add a form, view, include context in it and a template, writing tests.
  2. Another task, Creating New community requests [1] – I can say I learned so much from this one task alone, that includes creating a model, understanding migrations, adding permissions and groups after understanding signals, creating a logic for approve feature, reject feature and sending messages using statuses of the task. One satisfying thing was that I was able to add all of them in a  week, although I put in a lot of effort into it in that week.
  3. Creating checkboxes, I was pondering over what model’s field type to add to it. Reddit’s r/Django helped me. I used the widget, was able to use checkboxes but I wasn’t able to store that in my dB, it took me quite some time to figure out what was happening. I created a new char field in the model, and a multiple-choice field with Checkboxselectmuliple widget in the model form for the same field and used a clean_field method to convert checkbox input into a char type.

Python

As I was working, there were times when I was coding super fast, flawless, neat code, there were also times when every line I coded threw me errors and stack overflow seemed like a savior. It wasn’t always a rainbow. I did face a few errors which when googled were present in stack overflow, but none of them worked for me. I then tried to look at the errors for hours and days to analyze and make my own solution for it.  One such instance is this [1], I went on to use every answer present on the web (took me a lot of hours to do this), yet nothing solved it. I gave up looking for the answers online and came up with my own fix. But there were some errors that I just couldn’t fix even if I stare at it for days, and try out everything like this [1]

Fixing Problems

My Systers internship in Outreachy has been going on well, with ups and downs, with a lot to contribute, collaborate and lot to learn. I wish to do well in the remaining of my internship and make portal all set for production. I thank my mentors Tapasweni Pathak, Mansimar, and May for making my journey terrific so far.

Git, Alright!

Although I have been using git, I never encountered error messages like I see these days. Every time, I make a push into the opened pull request, the commit history magically turns into a long list of everyone’s commits.

This is because I keep using git rebase -i HEAD~x and not deleting the pick lines in the file. The length of everyone’s commits depends on this x.

One such mistake I made recently is this, the long list of commits made me sweat and I chose the easy way. Closed the PR.

I opened a new one. Again, I used git rebase -i HEAD~2. This time 2 commits unwanted.

I tried cherries now. Just kidding! I commanded my terminal to run git cherry-pick SHA. Nope. Didn’t work. Now, one more unwanted commit joined the club, adding to my woes.

After losing hope, I dashed youtube in the quest of good music. There, coincidentally I found this.

Remember that, nobody’s born wise.Not just me! 😛

Ah, finally! After googling stuff, after failing to understand what’s happening, I found the necessary command in a youtube video.

git rebase develop

Hooray! It worked.Phew! Lesson learned: Use rebase wisely.

 

How I migrated systers/portal to latest versions [In progress]

Heading edited from “Migrating systers/portal to latest versions ” to “How I migrated systers/portal to latest versions” . P.S. Not a clickbait 😛

Before solving this issue [PR], I considered migrating portal is something so huge and hard. Although it’s huge, it’s definitely not hard. It just needs some time, knowing how to google right, and more importantly patience.  All the while doing this, my moments ranged from ‘I have no clue’ to ‘Eureka! I did it’. Believe me, I was elated when I finally get it to work and solving no other bug gave me this immense happiness.

https://imgs.xkcd.com/comics/wisdom_of_the_ancients.png

Most often, people eye only the achievement, and actually undervalue the work that should go into it. One of the reasons I am writing this post is to remind someone like me not to quit if it’s hard. Keep on trying, keep rewarding yourself for the small progress – and even when you think there’s no progress, and you don’t know what to do next, realise that you were better than what you were before. Okay, I don’t want to go all philosophical in this 😛

One other reason is that someone who wants to migrate any other project mustn’t reinvent the wheel i.e going through the same states like me. Hence, I included the thought processes, that went through my mind to finally able to migrate it.

Although, I was nowhere near completion I recorded my progress here coz I knew someday I would solve this. B)

24th October I noticed that I don’t have python3.6 installed. Installed it and fed it to the virtual environment. Now I removed all the versions in requirements files and set django version to 1.11.6.

python manage.py runserver

ImportError: No module named apps

So I checked the django documentation and found out this, polls.apps.PollsConfig has to be added to settings file.

Then I figured, I need to read this.

26th October

After taking a one day break from this issue, I had a eureka moment.Instead of trying to read all the documentation, I should make a new dummy app. Compare between the old urls and settings files to new ones to edit them accordingly.

Templates in urls is changed. So I changed this in portal’s file.

TEMPLATES = [
 {
 'BACKEND': 'django.template.backends.django.DjangoTemplates',
 'DIRS': [],
 'APP_DIRS': True,
 'OPTIONS': {
 'context_processors': [
 'django.template.context_processors.debug',
 'django.template.context_processors.request',
 'django.contrib.auth.context_processors.auth',
 'django.contrib.messages.context_processors.messages',
 ],
 },
 },
 ]

I also compared settings files and read this, that it no longer uses patterns with it. https://docs.djangoproject.com/en/1.11/topics/http/urls/#example

I changed both of these.Now run the server again

<Insert Image>

I got this error. Now, removed this line

import community.signals # NOQA

from systers_portal/community/__init__.py

I also got errors with server and views so I changed the urls.py accordingly.

Then I got an error with migrations. like this,

django.db.utils.programmingerror relation already exists

Then I run this command

python manage.py migrate --fake-initial

Migrations issue solved.

Time to solve warnings:

<Insert Image >

Warnings Solved! Yay!

Now it’s time to fix flake8.Ah! What a pain.

Migration of django and other packages to their latest versions is done. Tests and flake8 issues have to be checked.Skipping this for 27th october.

 

 

If you are currently using Django old versions, Follow these steps to upgrade.

If you get errors at any stage, don’t be overwhelmed.You have google else ask for help.

Install python 3.6, On ubuntu, Use these commands

sudo add-apt-repository ppa:jonathonf/python-3.6
sudo apt-get update
sudo apt-get install python3.6

You must be already having virtualenv, so you must delete the old environment and create a new one.

virtualenv venv --python=/usr/bin/python3.6 --no-site-packages

Activate it.

source venv/bin/activate
createuser alice --pwprompt
CREATE DATABASE systersdb;
ALTER DATABASE OWNER systersdb TO alice; 
or
GRANT ALL PRIVILEGES ON DATABASE systersdb to alice;