The other side of the moon: 2005

Thursday, December 29, 2005

Using the Samsung SP0802N and ASUS K8V-MX together without lockups

Early this year, I started having hard disk problems (signs of an impending crash), and the decision was to replace my old samsung 40Gig with a new samsung 80Gig. The drive installed was a Samsung SP0802N - since I'd heard mostly good reviews of it. I decided to keep both hard disks connected though, just in case.

A few months ago, the computer started showing signs of corrupted RAM. This isn't something that normally happens on two year old RAM. 2 day old RAM, maybe, 10 year old RAM, maybe, but not 2 year old RAM. Power problems are a possibility, and that's not unexpected in my room. Anyway, the system was checked by a hardware guy, and he said that the motherboard needed to be replaced.

The new motherboard was an ASUS K8V-MX and along with that, we got an AMD Semprom processor.

On my next trip back home, I noticed problems with the system. It was running slower, and was locking up on disk intensive processes. A power cycle was required to get it back, and then there was a high chance that BIOS wouldn't recognise my disk, but would grab grub from my old disk. I didn't have time to look at it back in October or November, but in December, I did.

Three things came to my mind.
- bad power,
- bad hard disk/disk controller
- incompatibility somewhere.

We thought the grounding might be bad throughout the house because the stabiliser and spike buster indicated the same at various outlets. I also read through the motherboard manual. I generally do this before installing a new motherboard, but since I hadn't installed this one, I hadn't read it before. The manual said that a BIOS upgrade was required to function correctly, and that MSDOS and a floppy was required to upgrade the BIOS. I had neither, so ignored that for the moment.

Decided to go get a new hard disk and a UPS, but changed my mind about the hard disk at the last moment, and got just the UPS and some more RAM.

The night before I bought the stuff, I moved the PC to a different room to check (I couldn't get it started in my bedroom), and it started up (which further convinced me that it could have been a power problem). I read through /usr/src/linux/Documentation/kernel-parameters.txt for info on what I could do to stabilise the kernel. That pointed me to other docs, one of which told me that a BIOS upgrade was required for certain ASUS motherboards.

Today, I decided to try upgrading the BIOS. I do not have a floppy drive, or MSDOS, so that was a problem. Booted up from the motherboard CD, which started FreeDOS. FreeDOS, however, only recognises FAT16 partitions, and I had none of those.

Switched back to linux, started fdisk, and tried to create a new FAT16 partition 5MB in size. It created one 16MB in size - I guess it's a least count issue. Had to zero out the first 512 bytes of the partition for DOS to recognise it...

dd if=/dev/zero of=/dev/hda11 bs=512 count=1

Then booted back into FreeDOS and formatted the drive:
format c:

Then booted back into linux to copy the ROM image and ROM writing utility to /dev/hda11, and finally, back to FreeDOS to run the utility.

Ran it, and rebooted to get a CMOS checksum error - not entirely unexpected. Went into BIOS setup and reset options that weren't applicable to my box (no floppy drive, no primary slave, boot order, etc.)

Booted into linux and haven't had a problem yet.

Next step - enable ACPI.

Sunday, November 13, 2005

Why Foss in Education makes sense.

I'm supposed to speak at Foss.in on why FOSS makes sense in education. I chose the topic because it's something I'd worked on while I was still at NCST. The effective use of computers in children's education was very close to me.

This could well be the least technical post on this blog, but I've been having trouble getting coherency in my thoughts and I have to put it down for clarity. As has been the case in the past, this blog becomes a sounding board for me to dribble my thoughts. I'm not looking for comments, just trying to clear stuff in my head.

At Vidyakash 2002, it had been suggested that I get a hold of Seymour Papert's The children's machine. Papert worked under Piaget to study how children learn, and the results of these studies were the Logo programming language, and two successful books - Mindstorms, and The Children's Machine - the former being the inspiration behind Lego Mindstorms. I managed to get my hands on this book, and reviewed it for the Vidyakash Newsletter.

It had also started me thinking on how computers were being used in education. What follows are my thoughts.

The role of computers in education

I think before we proceed to decide on the tools to use, we need to know why we need computers in a school. What would we do with them, where would they be used, and who would use them?

I see three basic uses for a computer in a school.

Instruction Delivery
Instruction Enabling
Administration

The first two are those that I'm primarily concerned with, and the remainder of this post will be about those.

Both instruction delivery, and instruction enabling are interactions between a student and a teacher, where the latter may or may not exist. A teacher has traditionally been one who delivers instructional content to a student, and enables learning to take place. This generally means that the entire class will follow at the teacher's pace, and according to the teacher's will.

How does a computer fit in here?

Computers are excellent at instruction delivery. Primarily replacing the text book, however, rather than just throwing static text and pictures onto a computer screen, instructional content may be made far richer through the use of animations, sound, video and simulations. An instruction designer is required to effectively build this content.

Instruction enabling - in my terminology - has more to do with the computer being the target of learning. For example, one cannot teach C programming without a computer (although people have tried). The computer in this case is the laboratory within which students learn to apply the knowledge they've gained in the classroom.

If we concentrate only on computer education for a short part of this post, we could see that it is possible to merge the classroom and the laboratory into a single entity. Instruction delivery and experimentation can take place within a very real environment that is the computer, and in fact, the history of computer education is filled with examples of CBTs and web based courses.

Note: I haven't mentioned FOSS yet.

Now, let's drop the restrictions on our thoughts above and apply this to all forms of education.

Computers have been used in the past to teach Math, English, the sciences and various other subjects, but what has been the model followed?

Do we want the computer to program the child or the child to program the computer?

Too often, we've seen that CBTs flood the child with information that he has to memorise, and then throw tests at him to test his knowledge. He goes further once he's cleared all tests. This looks a lot like the way I program a computer. I throw a whole bunch of data at it, and then I write and constantly refine my code until it processes the data correctly, to give me my expected output.

Do we really want to create a generation of automatons? (automata?)

Instead, Papert shows a different model, and he takes the simple example of learning a language.

A child in Surat learns Gujarati with equal ease as a child in Toulouse learns French. In fact, several children in Surat learn both Gujarati and Hindi with that same ease. My grandma learnt Tamil, Telugu, Malayalam, Hindi and Bengali. At the same time, it's terribly hard for an adult to do the same. Most adults can never pick up a foreign language.

I've been in foreign language classes for adults for three languages, and in all cases, there have been people who pick it up really quickly, and there are those that never do. Invariably, it's the folks who would otherwise be considered childish, who pick up the language quicker. IAC, adult education is not the point here.

Learning is genetic

Papert suggests that the child in Toulouse and the child in Surat inherit learning from their respective environments. Learning through living as it were.(and I had a document to link to about this, but it no longer exists online). As we proceed through life, we pick up experiences through our various sensors - eyes, ears, nose, mouth, touch - and translate them into learning elements stored in our brains. Language learning is no different.

Can we somehow teach Math and Physics in the same way? Can we create a natural environment in which the rules of speech are not grammar and spellings, but mathematical identities or Newton's laws of motion?

Umm, yeah, Logo does that. It creates a Mathland and a Physicsland where children can learn math and physics by playing. The results are amazing, and terribly scary for teachers. Teachers need to accept that for once, a child may learn in an unplanned way. A child may come to an innovative solution that the teacher hadn't envisioned, in much the same way that Gauss summed the integers from one to a hundred when he was five.

Teachers need to be prepared to sit down and figure out a problem and its solution along with the student, and not by themselves, only to proclaim the solution hours later. It's the process of figuring it out that creates learning, not the process of listening to a clean room solution.

Debugging one's mistakes

Some experiences excite us and accelerate learning, while others scare us and slow it down, sometimes stopping it permanently. All too often, our teaching systems are designed to make children afraid of learning. We punish them when they make mistakes rather than showing them how to debug their errors and move towards a solution.

Enter FOSS.

FOSS is great for learning because the source code is available. Not just for reading, but for modification, and experimentation.

Those last two points are what makes pure foss projects different from source visible projects.

It's important to note, that price is not an issue here. Good software costs money, and can well cost a lot of money. One must be prepared to pay for the quality that one expects. The important gain that comes along with foss is the free laboratory that you get along with it.

We move back to computer education, because it's the easiest example to start with when talking about software.

For every topic within the computer science and engineering umbrella, we have several foss project that may act as the virtual 'lands' that we require. Each project can be a land in which the natural spoken language is the topic to be learnt. Operating Systems, Databases, Networks, Graphics, Communication, Multimedia, the list goes on. We have hundreds of lands, often overlapping, and the overlap can be a learning experience in itself. Much like a bunch of my cousin's kids from the UK who learnt Konkani after a month in Goa.

This is already happening in colleges, but at lower levels, mistakes are being made. Rather than giving them the ability to learn anything computer related, students are taught specific tools which leaves them vulnerable to change.

It's like teaching English poetry by covering only works by Yeats. The outcome is that students cannot recognise the works of Ogden Nash, or even CSNY as poetry.

Things must change.

Popularity begets obsolescence

When asked to switch from teaching tools like Microsoft Office in favour of generic topics like office automation applications, a common retort is, Should we not teach the current popular tools?.

The answer is a resounding no. Teaching specific tools, popular or not, leads to obsolescence when those tools cease to be in use, and who's to say that they won't. No one uses Wordstar, Lotus 123 or DBase today, yet these were the tools that we were taught to use in school. What should be the purpose of computer education?

Teach students to learn any tool
Let students learn through hands on experience
Throw responsibility into the hands of students

The idea should be to teach students concepts, and any tool that helps achieve this is good. Students should be exposed to a variety of tools, and the choice of specific tool should be theirs. A student may well choose the tool that gives him the edge when searching for a job.

Students can be put in charge of running the IT systems of a school. This will cut costs in a large way, and these student graduate from school/college with invaluable work experience that others only pick up after a year or two working in industry.

Why FOSS?

The big boys of FOSS all have their basis in education. Linux was started by Linus Torvalds to learn about the 386 architecture, and later to learn more about operating systems. LyX was written as a college project. The Gimp was written because its creators wanted to learn how to do graphical programming, and Gtk+ was born out of it because they wanted to learn how to write a good toolkit.

FOSS fosters education. For the persons contributing to it, and for the persons consuming it. The threshold for a user of Foss to become a contributor is extremely low - if we consider the different forms of contribution possible. Given the right language, it isn't hard for a domain expert to become a contributing developer.

Which brings us to other subjects.

Educational software already exists for non-computer related topics, and there is much FOSS to choose from. Software may be taken up and customised by a school. Specifically, students of higher classes could build or modify software for lower classes. These really do not have to be comp. sci. students. The emphasis here is not on getting the greatest algorithm implemented in code, or to squeeze out the last ounce of power from a low end machine. The emphasis is on applying domain knowledge to create a virtual world, on translating, for example, Newton's laws of motion to a set of rules by which a computer can build a simulation.

In Papert's experience, a child learns by teaching the turtle how to do stuff. The turtle here is a creature in the computer, and the child needs to teach this turtle how to first draw lines, then to use those lines to draw simple shapes, then to use those shapes to draw complex shapes, and further. In order to teach the turtle, the child must first figure out the steps herself, and that's where learning occurs.

As I write this, the same question keeps resounding in my head, "Ok, so this tells us how computers can be used effectively in education, but why Foss?".

The answer stems from the ability of foss to build on anothers ideas. Two students from different schools and different batches even may collaborate on the same idea. One may use libraries published by the other. The user of the library can gain insight into the ideas that went into building it, and can even suggest alternate approaches based on his or her usage of the library. Vinod Khosla seems to have similar ideas.

Academia is want to publish findings, results and papers. Foss is merely a solid implementation of that which is already published. Publishing one's learning as a Foss implementation spreads the knowledge and the discussion.

Much like Wikipedia allows users to collaborate on building information, so also, students should be able to collaborate in their learning. The output needn't be completely correct, but it must be debuggable, and therefore free and open.

So who is using Foss in education?

It depends on what we mean by using.

All of Mexico uses Foss in schools.
Several regions in France do too.
Schools in Virginia, Portland, Oregon, and several other states in the US.
Italian elementary schools regularly use Free Software.
Students in Melbourne run the IT systems of the school entirely.

However, none of the above are actual contributors to domain based foss. That's what we need primarily.

Learning is fostered more by doing, teaching and collaborating. Foss is based on all three of these, and is why Foss makes sense for education.

I'm not going to link to locations where one can find free software for education. This discussion has been less about that. It's more about learners contributing their learning as foss to improve the learning of others, and there aren't many links for that.

Other discussions

There have been discussions over the years about foss and linux in education, and stories of successful implementations. These are just a few of the links that I've collected. Most of these talk about implementing linux in a school's IT department.

Citations

This post was cited by the following papers:

S. Chopra and S. Dexter. 2009. Free software, economic 'realities', and information justice. SIGCAS Comput. Soc. 39, 3 (December 2009), 12-26.

Sunday, October 16, 2005

PHP 4.3, DHTML and i18n for select boxes

I'm not a web developer, and I'm not always aware of the quirks that experienced web developers are aware of. Like this one for example.

Populating list boxes through javascript

The standard way to populate a list box using javascript (or at least the way that I learnt, is to create Option objects, and add them to the list, like this:

var o = new Option('text', 'value');
var s = document.getElementById("my-selectbox");
s.options[s.options.length] = o;

Pretty simple, and when you're doing AJAX, it's just easy to return a javascript array, or maybe even the code that needs to be executed, and just eval that in the calling script.

HTML entities in a javascript populated list box

I'd been doing this for a while, when I got a bug report from someone saying that their single quotes were showing up as '. Checked up, and found out that while populating a select box using javascript, you've got to pass it the raw text — barring the primary html entities that is; <, > and & (and possibly ")). So, I set about unescaping all text before handing it to the array that was to populate the list box.

Not too much trouble. I was doing it in PHP, and this is the code I had to put in:

?>
var sel = document.getElementById("my-select-box");
sel.options.length = 0;   // delete all elements from sel
<?php
while($o = mysql_fetch_array($result, MYSQL_ASSOC))
{
$on = preg_replace("/(['\\\\])/", '\\\\$1',
html_entity_decode($o['name'], ENT_QUOTES));
?>
sel.options[sel.options.length] = new Option('<?php print $on ?>', '<?php print $o['id'] ?>'));
<?php
}
?>

Now don't get confused with the mixture of code in there. The stuff inside the <?php ... ?> blocks are php code, and the stuff outside is javascript.

What that code basically does is, fetch a bunch of records from the database. The names are stored htmlescaped in the database (because that's how we were dealing with them initially). After fetching, we need to unescape them because we're populating a select box using javascript, but because we're enclosing our javascript strings in single quotes, we need the funny looking extra preg_replace to escape single quotes in the string.

This code worked perfectly (although I should really be puttin unescaped data into the database anyway), until, that is, someone tried putting japanese characters into the name field.

What happened was a series of things. To start with, the japanese characters were showing up url encoded everywhere. Now url encoding is different from html encoding. Url encoding is where a character is converted into its hex code and prefixed by a % sign. Japanese characters in utf-8 are two bytes long, and hence take up 4 hex digits (hexits?). The url encoded representation of them was something like %u3c4A, and I had a whole bunch of strings that were filled with that.

URL encoding to HTML entities

A simple regex converted the url encoded data to an html entity:

$text = preg_replace("/%u([a-z0-9A-Z]+)/", "&#$1;", $text);

Note that there may be major flaws in this code. For starters, I pick the longest string of text after the %u that matches a hex digit. This may not always be correct. I prolly need to test that it's a valid utf8 character first, but I'll leave that for later.

Ok, so the url encoded text is converted to an html entity, and I can do this before it goes to the db, and everyone is happy. Not really. Remember that the list box populated via javascript doesn't like html entities either. It needs plain text.

My code above should have fixed that, except that it wasn't prepared for utf-8 strings. The default character set used by php 4's entity functions is iso-8859-1. Looked up the docs (I'm not a PHP programmer), and found that the third parameter to html_entity_decode was the charset, so, I changed my code to this:

html_entity_decode($o['name'], ENT_QUOTES, 'UTF-8')

But, instead of working, I got a whole bunch of errors telling me that PHP 4.3 didn't support UTF-8 conversions for MBCS (Multi byte character strings). This was a combined WTF and Gaaah moment. I'd promised I18N support, and now either PHP or javascript were playing spoilt sport.

The exact error message I received (for the search engines to index) is this:

Warning: cannot yet handle MBCS in html_entity_decode()!

Tried several things. Found out that javascript's unescape function could in fact handle the string, and I tried wrapping my text with a call to unescape(), but it didn't work.

The Eureka moment

The Eureka moment was when I realised that it was only list boxes populated via javascript that had this problem. If I had a list box completely populated with html, then there was no problem.

I'd been using javascript arrays all along because it's far cheaper to send them across a network and process on the client side than it is to send complete rendered html. Still, it seemed like I had little choice.

I decided to take advantage of the innerHTML property of all container elements, and add the options to the select element as an HTML block. This didn't work, and made the select box have one single long option.

My next thought was to tackle the select element's parent and dump the entire list box as its innerHTML. There were other problems with this approach though. Some select boxes were in a container element along with other elements, and I didn't want to have to draw all of them. One of the select boxes had javascript event handlers attached to it from the window's onload handler, and deleting and recreating the select box would effectively wipe out those handlers. Still, this was the best approach.

So, I decided to first surround all my selects by a div that did nothing else, and build the entire select box and options list as an html block, and stick that into the innerHTML property of the div. Finally, I had to reattach any event handlers that had existed before. This was simply a matter of saving the event handler before wiping out the select box, and adding it again later.

Here's my final code:

var sel = document.getElementById("my-select-box");
var fn = sel.ondblclick;
var sp = sel.parentNode;

// sel is useless after this point because we basically wipe it from memory

// get all html up to the start of the first option
// just in case we have more than just this select box in the parent
var shtml = sp.innerHTML.substr(0, sp.innerHTML.indexOf("<option"));

// get all html from the closing select onwards
// for the same reason as above
var ehtml = sp.innerHTML.substr(sp.innerHTML.indexOf("</select>"));
<?php
while($o = mysql_fetch_array($result, MYSQL_ASSOC))
{
$on = preg_replace('/%u([0-9A-F]+)/i', '&#x$1;', 
preg_replace("/(['\\\\])/", '\\\\$1', $o['name']));
?>
shtml += '\n<option value="<?php print $o['id'] ?>"><?php print $on ?></option>';
<?php
}
?>

sp.innerHTML = shtml + ehtml;

// need to reget the element because its location in memory has changed
// reset the ondblclick event handler
document.getElementById("my-select-box").ondblclick = fn;

Finally, code that works, and shows all characters correctly, and lets me go home to have lunch.

I'd also tried stuff with the DOM, like using createElement, createTextNode and appendChild, but it had the same problems as before - javascript want unescaped strings.

Now, maybe this is just a problem with firefox, I'm not sure as I hadn't tested in IE, but anyway, I found it and fixed it for me, maybe it will help someone else.

Ciao.

Friday, May 13, 2005

Using your iPod through FreeBSD

If you're a die hard linux or FreeBSD fan, or are just a geek and happen to have an iPod, then you have an opportunity to play. Getting your iPod to work with FreeBSD or linux can bring you much glee. Unfortunately for you, I'm going to shatter your hopes and tell you how to do it.

This howto is about FreeBSD 4.10 and a USB iPod shuffle, because that's what I have. Other kinds of iPods will also work without change in the procedure. If you have an older FreeBSD, it may or may not work this way, if you have a newer FreeBSD, it will work better.

Linux should be far simpler, and there are already instructions on the net, so I won't cover that.

So, to start with, you've got to make sure your kernel supports a few things.
- usb (if you have a USB iPod, but put it in anyway)
- ohci/uhci/ehci (the first two for USB 1.x, depending on the type you have, the latter for USB 2.x, put all three in, it won't cause problems)
- firewire (if you have a firewire iPod)
- sbp (serial bus protocol)
- scbus, da, cd, pass
- umass (mass usb storage)
- msdos file system support

To do this, edit your kernel config file, which should be in /usr/src/sys/i386/conf/KERNELNAME or something like that. uname -a will tell you for sure.

Add the following to the end of the file:

# For the iPod
options     MSDOSFS
device      sbp
device      firewire
device      scbus
device      da
device      cd
device      pass
device      uhci
device      ohci
device      ehci
device      usb
device      umass

Of course, check in advance that these aren't already in the file.

Once that's done, reconfigure and rebuild your kernel:

config KERNELNAME
cd ../../compile/KERNELNAME
make depend
make
make install

Also tell your system to load msdos file system at boot up. In /boot/loader.conf, add this:
msdos_load="YES"

Next, install gtkpod. You can portinstall it. Get version 0.88 at least if you need support for the iPod shuffle.

Now, plug in your iPod. The USB iPod should throw some messages into dmesg saying that the Apple iPod is loaded on bus:0, target:0 and lun:0 (or something else). Take note of these numbers.

Now, you can mount the drive as msdos. It should be attached to /dev/da0s1 or something like that. You'll know for sure from the dmesg messages. Run dmesg, and note whether the iPod is on da0 or da1 or something else. That's the drive your iPod is attached to. It will always be s1, ie, the first BIOS partition, known in FreeBSD as slice 1. On linux it could be as simple as /dev/sda1

Create a mount point for the drive:

mkdir /mnt/ipod

Mount the drive with the msdos filesystem - assuming this is a windows formatted iPod. If you have a mac formatted iPod, you'll have to do one of the following:
- Add HFS support to your kernel (4.10 may not have this support), or
- Reformat it as a windows iPod using windows, mac or pc-fdisk on FreeBSD/Linux. I wasn't very successful running pc-fdisk on the drive in FreeBSD, but that's prolly because of a bad USB implementation in 4.10

You'll find instructions about formatting at the gtkpod site.

To mount use this:

mount -t msdos /dev/da0s1 /mnt/ipod

Now run gtkpod, and follow its interface.

Once you're done with gtkpod, quit, and unmount the drive. The orange light might still blink, indicating that the drive is not ready to be removed. Use camcontrol to eject it:

camcontrol eject 0:0:0

The numbers at the end are your bus:target:lun numbers, so use the ones you noted down at the start.

Once you've done that you should be able to unplug the iPod and use it.

That's it. Have fun.

Thursday, February 24, 2005

Security in Linux

(from the linux security FAQ)

Glossary:

Cracker: someone who gains unauthorised access to a system. Not to be confused with a hacker. A hacker is really someone who like to play with computers and write good software. The media often tends to confuse the two. Hackers create, Crackers break.
IDS: Intrusion detection system. A system that tries to detect if your system has been compromised, and warns you of it.
Tripwire: A kind of IDS that checks whether critical system binaries and configuration files have been modified or not.
Firewall: a system that filters traffic moving from outside the network to inside, and vice-versa.
Port scanner: a program that checks a host to see which ports are open for external connections. It generally does a blind connect on all ports of a host. Some port scanners can do stealth scanning.
Security scanner: a program that checks a host for known vulnerabilities. Security scanners generally try to exploit a vulnerability without causing any harmful effects that would happen in a genuine break in. Some exploits are desined to crash a system, and in these cases, the security scanner may well have to crash a system if it is vulnerable. It is better though to be crashed while scanning your own system, than when someone is actually trying to crash you.

Introduction

Some of the most common questions asked by people trying to secure their linux systems are: What is security? How can I protect myself from a break-in? How secure should my linux box be? Which is the most secure linux distribution?

Security, in a nutshell, is ensuring the Confidentiality, Integrity, and Availability of your systems and network.

In short, to protect yourself, you should install the most recent security patches for your distro, turn off unused/unrequired services, run the others through tcpwrappers, and instead of telnet/ftp for remote access, use a secure alternative. The rest of this document will attempt to cover in more detail how to go about securing your linux system.

Most important is to decide how secure you need to be. You need to assess the risk to you, and base your security on that.

risk=(threat*vulnerability*impact)

Threat: the probability of being attacked
Vulnerability: how easy is it to break in
Impact: what is the cost of recovering from an attack

You cannot be 100% secure because there will always be security holes that you either do not know about, or are infeasible to patch.

When picking a distribution with security in mind, you should really pick one that has secure default values that you can tweak later. There's no point in installing a system that someone breaks into before you even have a chance to secure it.

Distributions like Secure Linux and slinux aim to set secure defaults.

Most distros do not have secure defaults because this tends to make the system hard for end users to use. Securing a system is really a trade-off between convenience to your users, and protecting their data.

In general, never rely on the default installation of any distribution. Consult the Linux Administrator's Security Guide for information on how to secure specific distributions.

Alternately, OpenBSD, was designed from the ground up as a secure unix, and is probably your best choice for a pure unix implementation. OpenBSD servers and firewalls are extremely secure.

A good idea would be to set up a rather open internal network, with tight security between the inside and the outside. That way, local users still have all the convenience, while the system is secure from an external threat. There are still two problems with this approach.

If you have legitimate users who need to connect to your system remotely, they would be inconvenienced by your external security. This shouldn't be an issue, as opening up your system to one person, can really open it up to the world.

On the inside too, if your users cannot be trusted, then lax internal security could hurt you. Your users could compromise your system by simply not setting good passwords, or leaving their terminals logged in while they are away. There have been cases when crackers have walked into offices, and found system passwords pasted on the office bulletin board for everyone's convenience. Although hitherto unheard of in India, companies abroad have been known to place spies in competitor's companies to steal corporate secrets. There's no use in having the ultimate in network security if your employee is simply going to copy all your secrets onto a floppy and walk out with it.

Apart from securing each computer system, and the network as a whole, one also needs to physically secure the entire installation.

Firewalls

To protect your network, you'd use a firewall between your internal network and the rest of the world.

A firewall set up is basically a set of rules that tell the firewall whether a given packet is to be allowed through or not. It can also log information on packets passing through, as well as modify or redirect these packets.

Setting up a firewall is very well explained in the linux firewall howto.

In general, you will require to configure IP Chains, IP Filter or IP Tables depending on whether you have a 2.2, 2.4 or 2.6 kernel.

A firewall is indispensible to the security of a network. Whether it is a dedicated machine or running as a service on another makes a difference.

Since a firewall is meant to filter traffic to and from your network, you ideally want it to sit between your network and the rest of the world. Your firewall would have two network interfaces, one of which connects to your network, and the other to the world.

Firewall rules decide which packets get from one side to another.

A firewall is generally implemented at the kernel level, and can be fast provided it works completely in memory and does not have too many rules. Ideally, you only want your firewall to filter IPs, and let a higher level service handle service based filtering, for example, have tcpd check if anyone is trying to connect to restricted ports on your system, or use a proxy based system to restrict websites that your users may visit. Better logging can be done at these levels, and they are less demanding on the kernel.

Services

Run only the services that you require and no more. On a desktop system, which you will not access remotely, there should not be any services. Run different levels of services on different machines.

You can find out which services are running by using the ps and netstat commands:

ps auxfw will show you a tree structure of processes running, while netstat -ap and netstat -altpu will show you which processes are listening on network ports.

You may also want to do a port scan of your machine using a tool like nmap (remember, Trinity used it in the Matrix Reloaded), or a security scanner like nessus.

Some really unsafe services include rsh, rcp, rexec. Many versions of sendmail and bind have well known security holes. Also disable echo, discard, finger, daytime, chargen and gopher if you don't use them.

Wherever possible, use an encrypted protocol rather than a plain text protocol. for example, use ssh instead of telnet/rsh, use scp instead of ftp, use IMAP w/SSL instead of POP3.

On a single user system, you should also disable identd, but on a multiuser system, this is a good way of tracking down problem users on your system.

You also want to use tcpwrappers to start your services. tcpwrappers are basically an intermediate between inetd and the service that actually serves a connection, like say telnet. Tcpd will check to see if the connecting host is allowed to connect to this service. Different kinds of access control and logging can be done through tcp wrappers.

TCPWrappers

TCPWrappers, and their associated configuration files /etc/hosts.deny and /etc/hosts.allow help a system administrator set up good access control for his system.

First, some background. Most unix systems use what is called a super server to run other servers. The purpose of a superserver is basically to listen on all ports that you want people to connect to, and when a connection is made to that port, it spawns the relevant server. The advantage of such a set up is threefold.

Primarily, all these other servers do not need to implement socket io routines. They simply communicate through stdio, and the superserver connects the socket's io streams to stdio before spawning a server.

Secondly, we keep our process table small by not running all servers all the time. Only one server runs all the time, and servers that are never required are never started. A server that is required is run only for the duration that it needs to serve a connection.

Finally, and really as a consequence of such a set up, we can implement security centrally, and have all servers benefit from it, even if they have no idea that it exists. In fact, these servers know nothing about security at all.

Now, in older systems, the superserver was inetd, or the Internet Daemon. In newer systems, it has been replaced with xinetd, which is simply an extended inetd. xinetd can implement security internally, while inetd spawns an external security handler, most commonly tcpd.

The configuration files for these servers are usually /etc/inetd.conf and /etc/xinetd.conf, /etc/xinetd.d/*. We aren't concerned too much about the contents of these files, except what services are started by it. Most commonly, the superserver will start services like telnetd, ftpd, rlogind, rshelld, rstatd, fingerd, talkd, ntalkd, etc. Many of these may not be required, and can be stopped. In inetd, this would involve commenting out the relevant line in inetd.conf, while in xinetd, this would involve setting disabled=yes in /etc/xinetd.d/service_name.

Disabling these services altogether will cause an inconvenience for your users. For example, you may want to allow nfs connects from certain hosts within your network, but disable it for everyone else. Furthermore, several services have well known exploits, and detecting when someone is trying these is a good early warning system for a possible attack.

This is where tcpwrappers, or tcpd (the tcp daemon) as it is known, comes in. TCPWrappers are basically wrappers around your services. It is implemented in two ways, either through the tcp daemon, which starts the requested service after doing access control checks, or through libwrap,
which may be linked into the server itself. Either way, the wrappers rely on the files /etc/hosts.{deny,allow}.

The full intent and use of tcp wrappers is well documented, and is shipped with all linux distributions. It can be found in /usr/doc/tcp_wrappers/* or /usr/share/doc/tcp_wrappers/*. Here I will outline the most important usage.

How exactly does tcpd come in to play?

Instead of directly starting the server, inetd can start tcpd, and tell tcpd to start the correct server after performing any checks, etc. that it wants. If one opens /etc/inetd.conf, one will find against the telnet and ftp lines that the daemon to be spawned is tcpd with
in.telned/in.ftpd as arguments.

telnet  stream  tcp     nowait  root    /usr/sbin/tcpd  in.telnetd

The other values on the line aren't important for that discussion, and you'll figure them out soon enough.

Now, in execve parlance, the first argument passed in the argument vector corresponds to argv[0], i.e., the name that the program should call itself. tcpd takes this hint, and calls itself in.telnetd (which is what will show up if you list running processes). It performs its checks, and then execs in.telnetd, passing all file descriptors on.

Thus, we have tcpd, which has a single access control file, doing checks for most daemons. Further more, since tcpd comes into the picture only while a connection is being estabilished, and leaves the scene thereafter, there is no overhead involved (except for that during checking, which is what we want).

Now, not all servers are started through inetd. Many, like sendmail, apache, and sshd, run as standalone servers. These servers can have tcpd compiled into them using libwrap.a and tcpd.h. They will then automatically check with hosts.allow and hosts.deny.

Now all these options must be selected while compiling tcpd and libwrap, but the defaults are decently secure anyway.

To check the configuration of your tcpd wrappers, use /sbin/tcpdchk. Give it the -v flag for more information.

The hosts.{deny,allow} files

Wietse Venema, the creator of tcpd, also developed a 'language' for specifying the access control rules that govern who can use which service.
These rules are specified in hosts.allow and hosts.deny. The normal strategy is to deny all connections, and explicitly allow only services that you want people connecting to. For eg: your hosts.deny would read like:

ALL: ALL

This means deny all services to requests from all addresses.

Remember that hosts.allow is checked first, then hosts.deny. The first rule that matches is applied, so basically, if a match is not found in hosts.allow, it will be denied. If hosts.deny is empty or missing, then the default is to grant access. The extended acl language also allows deny rules to be specified in hosts.allow, so you really only have to manipulate a single file.

Rather than go into the details of all possible configurations, I'll just paste my own hosts.allow file here, and explain it line by line.

#
# hosts.allow This file describes the names of the hosts which are
#  allowed to use the local INET services, as decided
#  by the '/usr/sbin/tcpd' server.
#

# allow everyone to connect to 25.  ACL implemented in sendmail
sendmail: ALL

# ssh from certain hosts only.
sshd: 202.141.152.16 202.141.151. 202.141.152.210 127.0.0.1 : ALLOW

# Allow people within the domain to talk to me
in.talkd in.ntalkd: 202.141.151. 202.141.152. LOCAL : ALLOW
in.fingerd: 202.141.151. LOCAL EXCEPT 202.141.151.1 : ALLOW

# Set a default deny stance with back finger "booby trap" (Venema's term)
# Allow finger to prevent deadly finger wars, whereby another booby trapped
# box answers our finger with its own, spawning another from us, ad infinitum

ALL : ALL : spawn (/usr/sbin/safe_finger -l @%h | /bin/mail -s "Port Denial noted %d-%h" hostmaster) & : DENY

The above file starts off by allowing anyone to connect to my sendmail daemon. The sendmail daemon is in a better position to do access control, as this needs to be done based on sender and recipient address rather than IP address. If you suspect that certain hosts are unnecessarily hitting you on 25, then you can block them explicitly.

The next line allows ssh connections from certain specific hosts in the 202.141.152. domain, and all hosts in the 202.141.151. domain. I may need to connect to my machine from different places on my network. These connections would be over a broadcast network, so I prefer ssh for connecting.

I allow finger and talk from within my domain, but not from 202.141.151.1.

Finally, I set a booby trap for anyone connecting to services that they are not authorised to access. A reverse finger is done on the attacking host, and a mail is sent to the administrator of my machine with this information.

Intrusion Detection

Intrusion Detection is the ability to detect people trying to compromise your system. Intrusion detection is divided into two main categories, host based, and network based. Basically, if you use a single host to monitor itself, you are using a host based IDS, and if you use a single host to monitor your entire network, you are using a network based IDS. Most home users would use a host based IDS, while universities and offices would have a network based IDS.

There are many Intrusion Detection Systems (IDS) for linux, the most popular, for host based and network based is snort. Others are portsentry and lids - the Linux Intrusion detection system [inactive as of 2013]. Going into the details of each of these is beyond the scope of this document, but all tools have very good
documentation.

In addition to an IDS, you would also want to use an Integrity checker, which basically makes sure that none of your binaries and critical configuration files have been modified.

When a cracker compromises a system, the first thing he's likely to do, is create a backdoor for himself. There have been many instances where critical binaries like the ssh daemon have been replaced with trojaned versions that capture passwords and mail them back to the cracker. This
then gives the attacker free access to the system, even if the original hole is plugged.

Tools like tripwire, AIDE, and FreeVeracity check the integrity of your binaries. Of the above, FreeVeracity is reputed to be very easy to set up and use.

Typically, one would create an integrity database when the system is installed, and update it whenever new binaries are installed. The database should always be backed up onto read-only media like a CD. The checker should be run everyday through a cron tab entry, to check all critical files. If the tool finds any discrepancies, it sends a mail to a pre-defined email address.

The Integrity Checker should be configured well to prevent false alarms which makes it a hindrance more than an aide.

So, how do you know whether you've been compromised or not?

CERT has released an advisory to help you identify if an intruder is on your system.

In short though:

Check your log files,
Look for setuid/setgid files, especially if they are owned by root
Check what your integrity checker has to say about your system binaries
Check for packet sniffers which may or may not be running
If you didn't install it, it shouldn't be there
Check your crontabs and at queues.
Check for services that shouldn't be running on your system
Check /etc/passwd for new accounts/inactive accounts that have suddenly become active

Full details, including how to do the above are listed in the abovementioned document.

So what do you do once you know that you've been compromised?

Well, the first thing is not to panic. It is very important not to disturb any trails that the cracker has left behind, as these can all be used to detect who the attacker was, and even exactly what he did. Very importantly, don't touch anything on the system.

Step one is to disconnect the machine from the network. This will not only prevent further attacks, it will also prevent the attacker from covering up his trails if he finds out that he's been caught.

To prevent any data from being changed, you should also mount your file systems read-only.

Copy all your log files out to another system, or a floppy disk, where you can examine them safely.

Analyse the saved data to determine what the attacker did to break in and what he did after that.

Restore your system from known pre-compromise backups.

Again, CERT has published a white paper on recovering from an attack.

Testing Security

There are many commercial organisations that will test the security of your system for you. These are costly though. A cheaper alternative may be to use one of the many web based security scanners to test your system.

http://www.hackerwhacker.com
http://maxvision.net/#free
http://www.grc.com
http://privacy.net/analyze
http://www.secure-me.net/

You shouldn't trust what they tell you, but it will be interesting to monitor your logs and network while an attack is in progress.

You can also test yourself by using your own port scanner and security scanners.

Nmap is the most popular and widely used port scanner around, both by black hats and white hats. It can also determine which OS you use, which is what a cracker would need to know to find OS specific vulnerabilities.

If nmap can't figure out which OS you're using, that could slow down your attacker for a while.

SATAN (Security Analysis Tool for Auditing Networks) that was developed by Dan Farmer of Sun Microsystems, and Wietse Venema (of tcpd and postfix fame) from Eindhoven University of Technology, Netherlands, and currently at IBM, was developed with the specific intent of doing everything that an attacker would do to gain unauthorised access. This tool has been replaced with a next generation version called SAINT

Nessus has a plugin based architecture. Vulnerabilities checks are written as plugins, which means that you can check for new holes as they become publicly known, without upgrading your entire binary.

Viruses and Trojans

The real question in this section is, is linux vulnerable to viruses and trojans.

Practically, no. Technically though, it is possible.

Due to the design of Linux, it is difficult for viruses to spread far within a system, as they are confined to infecting the user space of the user who executes them. Of course, this is a problem if infected files are launched by root, but as a security conscious individual, you wouldn't be running untrusted files as root, would you?

It is theoretically possible for a virus launched by a regular user to escalate its privileges using system exploits; however, a virus with this capability would be quite sizable, and difficult to write. As of this date, few viruses have actually been discovered for Linux, and the ones that have been discovered aren't worth losing sleep over. This will undoubtedly change with time.

Worms like l10n and Top Ramen only worked because the systems were insecure to begin with. An insecure ftpd/rstatd was used to automatically gain access to machines, and use them as further launching grounds.

Viruses do exist for Linux, but are probably the least significant threat you face. On the other hand, Linux is definitely vulnerable to trojans.

A trojan is a malicious program that masquerades as a legitimate application. Unlike viruses, they do not self replicate, but instead, their primary purpose is (usually) to allow an attacker remote access to your computer or its resources. Sometimes, users can be tricked into downloading and installing trojans onto their own computers, but more commonly, trojans are installed by an intruder to allow him future access to your box.

Trojans often come packaged as "root kits". A "root kit" is a set of trojaned system applications to help mask a compromise. A root kit will usually include trojaned versions of ps, getty, passwd.

At this point in time, virus Scanners for Linux are aimed at detecting and disinfecting data served to Windows hosts by a Linux file/mailserver. This can be useful to help stop the spread of viruses among local, non-Unix machines. Due to the lack of viruses for Linux, there are presently no scanners to detect viruses within the Linux OS, or its applications. Trojans present a greater threat to the Linux OS itself than do viruses, and can be detected by regularly verifying the integrity of your binaries, or by using a rootkitdetector.

Trojan Detectors:

Chkrootkit: Checks Linux system for evidence of having been rootkitted.

Root Kit Detector: A daemon that alerts you if someone atttempts to rootkit you.

Virus Scanners for Linux File Servers:

AMaViS: A sendmail plugin that scans incoming mmail for viruses.

AntiVir for Linux: Scans incoming mail and ftp for virusess.

Interscan Viruswall: A Firewall-1 add-on that scans ftp, htttp, and smtp for viruses.

Sophos AntiVirus: Checks shares, mail attachments, ftp, etc. for viruses.

Finally, a system administrator must understand that security is a process. You need to keep yourself up to date with all the latest security news. Subscribe to the securityfocus, cert, and other security related mailing lists. Stick to the comp.os.linux.security newsgroup. That's also a good place to post your queries - if they haven't already been answered (hey, most of this doc was from the faq in there).

Monitor your log files regularly. Use remote logging to protect against modified log files. Protect your system binaries. Keep them on read-only partitions if required.

The only way to protect yourself completely, is to be aware of what is happening all the time.

References:

The comp.os.linux.security faq
The linux security howto
The linux administrator's security guide
The linux firewall and proxy server howto
CERT advisories
Security Focus

Monday, January 24, 2005

End of line backslash on blogger

If a blogger post has a line that ends with a backslash, blogger will delete the backslash and the following newline character to merge two lines.

eg:
line1 \
line2

shows up as:

line1 line2

after posting, and in further edits.

They seem to be parsing the input as if it were a unix command line or something like that.

The solution is to put a space after the \

Saturday, January 15, 2005

You've got mail! - loud and clear

You've got mail, announces the cheerful voice at AOL.
People who don't use AOL as their ISP will have seen it in advertisements and in the movie too at least.

AOL's program doesn't tell you anything more than that though. Who's the mail from, what's it about, nothing. To do that, one needs to parse a mailbox for the sender and subject, and then use a TTS tool to say it out loud.

Today I installed festival. It's a pretty cool TTS tool — runs on various unixes, which means prolly MacOSX as well.

I played around with festival for a few minutes while additional voices downloaded, and then hacked up this:


#!/usr/local/bin/bash



lock=/tmp/newmailnotify.lock

[ -e $lock ] && exit

touch $lock



awk " /^From / {from_start=1;sub_start=1}

      /^From:/ && from_start==1 {print; from_start=0}

      /^Subject:/ && sub_start==1 {print; sub_start=0}" /var/mail/philip | \ 

   tail -n2 | \ 

   sed -e '1iYou've got mail

s/:/ /;s/ R[eE]://g;s/$/./' | \ 

festival --tts



rm -f $lock

Attached it to my Inbox Monitor, to run every time the mailbox size increased, and now I have a (rather drab) British voice announcing my new mail, along with who it came from, and what it's about.

Yes, the script could do with improvements. I'm currently too lazy to figure out why case insensitive matches aren't working with sed or why I can't use alternation in my regexes, but hey, it's past 2:30am

Comments and suggestions welcome.

Oh yeah, I planned on using a single lock file across users, because:
a. The audio device would be busy anyway
b. Parsing large mailfiles takes a lot of time and is disk intensive. I don't want more than one of these to run at a time.

Update:
Festival was having trouble with Indian names, and some of the mailing lists I'm on, so I added some entries to its lexicon. Unfortunately, couldn't figure out how to get those entries loaded. .festivalrc did everything, but select my lexicon. I think it selected the default lexicon after selecting mine.

The only solution was to convert my script up there to one that output a festival script (scheme) rather than plain text.

This is what I came up with:


#!/usr/local/bin/bash



lock=/tmp/newmailnotify.lock

[ -e $lock ] && exit

touch $lock



msg=$1

[ -z "$msg" ] && msg="You've got mail!"



awk --assign msg="$msg"     ' /^From / {from_start=1;sub_start=1}

      /^From:/ && from_start==1 {from=$0; from_start=0}

      /^Subject:/ && sub_start==1 {subject=$0; sub_start=0}

      END {printf("%s\n%s\n%s\n", msg, from, subject);}

    ' /var/mail/philip | \ 

   sed -e 's/:/ /;

           s/ R[eE]://g;

           2,$s/$/./;

           /^From/s/ </, </;

           1i

(lex.select "philip") 

(SayText "$a")

          ' | \ 

festival --pipe



rm -f $lock

and this is what my .festivalrc file looks like:

(lex.create 'philip)

(lex.set.phoneset 'mrpa)

(lex.set.lts.method 'oald_lts_function)

(lex.set.compile.file "/usr/local/share/festival/lib/dicts/oald/oald-0.4.out")



(lex.add.entry '("sachin" n ((( s a ) 0) (( ch i n ) 1))))

(lex.add.entry '("vinayak" n (((v ii) 0) ((n ai) 1) ((@ k) 1) )))

(lex.add.entry '("amarendra" n (((a m) 0) ((@) 0) ((r ei) 1) ((n d r @nil) )))

(lex.add.entry '("vijay" n ((( v ii ) 0) (( ch ei ) 1))))

(lex.add.entry '("ilug-bom" n (((ai ) 1) ((l @ g ) 1) ((b o m) 0) )))

(lex.add.entry '("linuxers" n (((l i) 0) ((n @ k s @ r z ) 1) )))

Interestingly, it reads out mm.ilug-bom as millimetres dot i-lug-bom.

The other changes in the script allow you to customise your leadin message, and also ensure that From is read out before Subject.

Festival has an email mode, but modes only work when reading from a file or using the (tts 'filename mode) syntax. Since my input comes from stdin, there's no way to specify it.

Update 2:

Inspired by jace, I decided to try using procmail for this. The only change to the script is that /var/mail/philip is no longer in there. It reads from standard input. My procmail recipe looks like this:

:1 c

*^From:

|/home/philip/bin/newmailnotify.sh

and I put it at the end of .procmailrc.

I haven't yet been inundated with a deluge of emails, so don't know how it will work with bulk downloads. This of course runs after mails are sorted into folders, so only those that still make it to my inbox get reported.

Friday, January 14, 2005

Sigdashes

Sigdashes are a (de facto) way of specifying where your mail ends and your signature starts. They're pretty cool, because smart mailers and newsreaders can do funky things when they notice sigdashes.

For example, many mail clients will strip off old signatures when replying to mails. This is a Good Thing, because, hey, just one signature per mail ya?

Many mail clients, like mutt, can display signatures in a different colour or font.

So, what /are/ sigdashes?

The character sequence "dash dash space" on a line by themselves are collectively known as sigdashes. It looks something like this (without the quotes):
"-- "

Configuring your mail client to use sigdashes:

Pine:
Setup | Config
- Composer Preferences | Enable Sigdashes
- Reply Preferences | Strip From sigdashes in reply

Mutt: (sigdashes on by default)
in .muttrc, add
set sig_dashes=yes

unless it's set to "no" in /etc/Muttrc or ~/.muttrc, you do not need to do anything.

Thunderbird (via TagZilla):
In the TagZilla | Formatting screen, set Tagline Prefix to (without quotes)
"\n-- \n"

Thunderbird (no TagZilla) / Evolution / Web based mail:
Include the sigdashes line as the first line of your signature file/text.

Kmail / Outlook Express:
(No idea)

Go forth and spread the good news.