Building Dynamic Websites at Harvard – Lecture 1

Building Dynamic Websites at Harvard – Lecture 1


START [ Silence ]>>All right, welcome back
to Computer Science E-75. This is lecture one in which
we actually dive in to PHP. And so you pulled
up your browser, you hit www.google.com
and you hit enter. Can we play that back to the
story, what happens first and try to impress everyone
with as much technical detail by just one step as possible. Give me one step
in this process. You have hit enter,
what happens? Yes.>>Communication
with the DNS server.>>OK, so there’s
some communication with the DNS sever, where by
your browser asks the local operating system. What is the IP address
of google.com. If you’re operating system
itself does not know, it turn asks the
local DNS server. And who typically owns or
controls these DNS servers? [ Inaudible Remark ] Yeah, you’re ISP. So, for Verisign,
Comcast, Harvard, your company anyone
along those lines. And if you’re company your ISP
does not know what the IP is for google.com, what
happens next? Yup.>>They probably know another
DNS provider that knows so little, it may
direct to that stuff.>>Excellent, they probably know
some other DNS server and so, they ask the– a
bigger fish followed by a bigger fish and so forth. And worse case, these are these
root servers that at least know where the other authorities are for the various .coms,
.nets, .orgs. And the reason that all works
is that when buy google.com or on your personal
domain, you at least have to tell you’re registrar what? Yeah.>>On the DNS server if you’re– where you’re getting
your website.>>The DNS serves of the– of
the hosting company of what not, where your website lives, and
that’s typically called NS1 and NS2, just conventions. But the important detail is that
they’re usually two DNS serves that in return know your
websites as IP address, knows your webs– domain names,
e-mail servers and the like. OK, so now my browser
knows the IP address of my google.com,
what happens next? Yeah.>>Look, sends and
it get request.>>Good.>>Yeah, the room of
actual hard drive.>>Good so we told the story of the virtual envelop
a.k.a. packet and that’s send from point A, you,
to point B, Google. And inside that envelop is
this message “get me slash” and then there’s some reminder of the protocol that’s
being spoken “http slash 1.1” or what not. What’s also inside
of that pocket? What amount of information?>>Could be reminder of
what actual web address that user typed in.>>Good, so a reminder of
the address that user typed in which is the host HTTP
header and this is crucial for what feature offered by today’s web servers,
someone else, yeah.>>Virtual hosting.>>Virtual hosting, whereby
you can put many websites on the same physical machine
and even on the same IP address because browsers thankfully will
remind the server what host name was actually requested so that
the web server can distinguish between your website of someone
else’s website and so forth. All right, this virtual
envelop goes to Google. Google opens the
envelop so to speak. She get slash dot, dot. Realizes “Oh, you want
the root of our website?” In Google’s case, that’s all
the HTML and other assets that compose their home
page for searching. And so, they respond with
the packet of their own or more packets of their
own inside of which is all that HTML, your browser
receives it, renders it, connection is close. Now in terms of more
subtle details, browsers these days are
fairly smart and that rather than ever have to ask
the operating system, Mac OS, Windows, whatever. What the IP address is of
google.com, a browser will cache that IP address typically. So this just means it’s
slightly more efficient than asking the operating system
and certainly more efficient than asking local DNS servers. But there’s a got you,
and one of the themes of this course will be to try to
point out some of these details. Because, if you are
not just a user but you’re actually a
web developer trying to build new websites, suppose that the IP address has
been cached but suppose that you moved the
website to another server or another virtual machine. There are these got you’s
you might run in to. And so one of the
recurring themes of any sort of web development especially in this PHP world is constantly
be clearing your cache. And in one other upsides
of using Chrome frankly for primary development is it
has incognito mode which, well, usually is used so you can
browse sketchy places online. It can also be used to
a developer’s advantage and that it will prevent
cookies from being saved and other details
from being cached. But even then, it’s not
perfect and even I often to have quit the
browser entirely clear my cache manually. If you ever notice anomaly
is happening or like, “I know I changed that file”
it could just be some stupid cache issue. Just– So put that in
the back of your mind so that you don’t waste 10, 20 minutes some night this
summer chasing down a bug that you actually already fixed. Caching takes many forms
and DNS is just one of them. All right. So any questions on that
big picture of HTTP? None? All right. So where does this all fit in? So this is the picture we
essentially just painted verbally, so what’s
on the end of point B? In this case Google
or some other server. So one of the most popular web
servers out there is Apache. This is freely available
software. It can run on Linux computers,
Macs, Windows computers but it’s super common in
the Linux and Unix world in particular, and those
tend to be machines used for web servers these days. It is the A in LAMP. So LAMP is just a silly buzz
word, a Linux, Apache, MySQL, PHP, and that’s just a buzz
word saying, “I’m using all of these various technologies.” But common jargon in the
industry is to say that “I’m running a LAMP stack.” And that just means you have
Linux as you operating system, Apache as your web
server and so forth. And so there’s nothing
technical about the term, but we’ll be looking at
the individual pieces. So one of the latest versions
of Apache is 2.2 something. This is the documentation there. I will say from personal
experience, I’ve never found it
the most user-friendly. So frankly Google is the
better friend to me at least than Apache is on websites, stockoverflow.com,
serverfault.com. These are wonderful places where smart technical people
post generally useful solutions to common problems. So keep an eye out
for– or make use of those resources
as you see fit. But what are the kinds
of things that you can do with the web server
configuration? Well, virtual host name. So this is a representative
snippet from a file called httpd.conf. And let me just pull
up a little scratch pad so we can type out
some notes here. And the blackboards are
occluded by the projector here so we’ll use text edit. So this just so happens
to be the name typically of a configuration file
however you might also see it as apache.conf, apache2.conf. It really depends on
your operating system or the distribution of Linux
for instance that you’re using. But the important takeaway is that this is typically the
main configuration file for an Apache based web server
and internetics [phonetic] in Microsoft IIS server
has similar features. There’s other web
servers software but Apache is definitely
among the most the common. And here is a representative
snippet from that file that apparently is
implementing what feature for the web server
if you can infer. Kind of just guess
by reading it. Yeah.>>First of, it’s a port 80 so
that’s on a regular website.>>OK. Good so you see a
port 80 at the very top there which suggests it’s
indeed a sort of standard website
living on a standard port. What else comes to mind? What other feature is being
conveyed by this configure? Yeah.>>A database.>>A database, where are
you inferring database from?>>Is that port 443?>>44– so not– 443 is
actually used for SSL. So there’s two pieces here. We can– and we’ll focus
on both, but first, the top one port 80 is sort
of the simpler of the two, so let’s look there first. So I’ll put this one up. So virtual hosting, this feature where by a web server
can use multiple– the same IP address for
multiple websites is implemented literally by a way
of a file like this. This is telling the web server, and the top thing there
is just a comment, this is telling the web server
“Hey, define a virtual host or Vhost on port 80 of any IP
address that the server have.” So star denotes anything,
and in this case, it’s meant to mean
an IP address. And this is relevant because if
the web server just so happens to have multiple IP addresses,
this is a wild card character that just says, it doesn’t
matter what IP address, the request comes in on,
go ahead and just listen on port 80 on all of those IPs. So another common thing
specially if you’re developing on your local virtual machine
which is increasingly common, and this again what we’ll do in
the class, sometimes you do need to know the IP address specially
in various cloud environments. So just be mindful of sometimes
star is not sufficient unless you have configuration
another layer of configuration that I’ll wave my
hand up for now because we’re just
looking at snippet here. So this says, listen on
port 80 on any IP address that the server has
for incoming requests. Now, when in– requests do come
in to the server, thankfully, they should have that
host colon HTTP header that reminds the server
what this request was for. So, if you skim through
some of these, and let’s skip the top
part now, server name, this is where the Vhost’s
name is actually defined. And we’ll see it down here, too. For the SSL version, the name of
this website will be the same. But I’ve also defined
what we call an alias, which is just what in this case? Web sanity check. Yeah.>>The same size of [inaudible].>>Exactly. The alias here is just
cs75.net with no www. So, this is just one of the
steps necessary to ensure that both www.cs75.network
and cs75.network. So, the quick story
I told on Monday about certain websites just not
working with just something.com or the like, is because
someone did not think to configure a fairly
minor detail like this. Again, this is Apache but
other web servers, a Lighttpd, Nginx and others have
similar features. So, this is one step and just
to time Monday until tonight, what was the other key
detail that you need to do to ensure that both work? Both www and not ww. Not a redirect. Redirect is really just
to ensure these are ends up at the place you want where you want both destinations
fundamentally to work. [ Inaudible Remark ] So, we needed a DNS record,
an A record in particular. So, we needed to specify that
cs75.net itself has an A record and we need to specify that
www.cs75.net has an A record or, what other type of
record could be? [ Inaudible Remark ] Multiple aliases which we
called CNAMES on Monday. So, CNAME are canonical name. Now, these two is sort of
a corner case, technically, unfortunately, you can
not generally make CNAMES for the root of d domain. Cs75.net cannot be a CNAME for
something else but something with a host name, www,
ftp, mail, .something.com, those can all be CNAMES and that’s a bit even
over simplification. You can have cs75.net be at
CNAME technically but things like e-mail tend to
break as a result. So, let me just make the
blank statement that this has to be an A record, this can
be an A record or a CNAME. So, just little things you need
to keep in mind when setting up for instance your own domain
name that you just bought. Server admin, so this is
just a floppy detail so that if there’s ever an error on your
website and you see it like 404 or something like that, if you haven’t customized
the error message, the footer of the web
page is generally going to give the email address of
the web master at something.com. In this case, we’re telling them to use this address
just because. So, it’s not something like
web master which doesn’t exist in our case since we’re
such a small shot. Lastly, custom log, error log,
this kind of do with they say. It’s just specifying the
folder in which you want logs to be stored and most important
line here though perhaps is document root. Now, this is kind of crazy
long encryptic [phonetic]. It just is what we as a
class decided to do in terms of the layout of our hard drive. However, all this is
telling the virtual host is that the HTML files or
PHP files, GIFs or PNGs for this virtual
host called www. [inaudible].net lives
specifically in this directory on the server. Very often this will be much a
shorter path for normal people but we’ve kind of laid ourselves
out fairly hierarchy play which is why it’s so long
but that’s all it means. All right, any questions? And again, this is something that for the first project you
have an opportunity to tinker with and even break if you
want and you’ll be able to restore it rather easily. All right. So the virtual host on port 443
is a little more interesting but also mostly a duplicate
but the few lines are new which one’s jump out at
you is obviously new, so all the SSL stuff
at the bottom. So SSL is kind of a
pain to setup at least with certain web
servers whereby you have to configure a few files. So what is SSL? SSL is Secure Sockets Layer. This is the protocol
that websites use to communicate securely
with browsers but what is necessary before
you can actually use SSL on your website? Does anyone know? What’s involved in doing this? Yeah.>>I think you need to
distribute a certificate that the user will have to get.>>Exactly, you need to
distribute a certificate that the user will need
part of, it will need to get some help from you. Thankfully, it’s all automatic. So how do you go about
getting a SSL certificates? So there’s a couple
of things you can do. You can either, create
one and sign it, so to speak to yourself, or
you can pay someone else. And have you ever been
to a website that said– whereby the browser upon
visiting, yells at you as saying something like “this
website cannot be trusted” only, you know, “you should not go
here” for some reason like that. So that’s because that
website probably doesn’t have a certificate that was signed by what’s called a
certificate authority. And I think I can actually
simulate this, I just happened to cross this the other day
because I wanted to make one of my university
websites run over SSL. So let me open up
chrome here and type in https://cs.harvard.edu,
enter. Perfect, perfect example. So CS department
has not paid for and what’s called a SSL
certificate ironically. And I will fix this but it’s
a great demonstration, so. What does this mean? It means that the site isn’t
necessarily insecure, per se. It pretty much boils down and
this is some what pessimistic to the fact that we have not
paid for in SSL certificate. We have created an SSL
certificate whereby that’s just a command. On a Linux computer, you typically run a
command called Open SSL with some fairly arcane command
line arguments and hit enter. And that gives you what’s called
a public key and a private key. What does that mean? Well for our purposes here, just know that there’s a fancy
mathematical relationship between this thing called a
pubic key and a private key. They’re really just big random
numbers and mathematically, people in the internet
can use CS75’s public key to encrypt information to us. So if some random
user is visiting, trying to visit Harvard
CS website, their browser automatically
will say the cs.harvard.edu, “Can I please have
your public key?” And the browser will
send it for free and over the internet publicly, it’s not something
that’s secure. Public key is meant to
be– by definition public. That browser will behind the
scenes unbeknownst to the user, use that public key,
that big random number to encrypt their request. And the request can be
something stupid like get slash and that’s literally all
my request just now was. But it encrypts it
none the less. And you could probably guess,
what is the only number in the world that can decrypt
something that’s been encrypted with the public key,
the private key? And that’s something
that my server or the CS department
server keeps to itself. And you don’t give it out, and the web servers
never going to send it. It’s stored somewhere
on the hard drive. Now mathematically,
that key will be used with mathematical formula to reverse the effects
essentially of the encryption. So that what the CS department’s
web server finally sees is get slash or whatever it
is the user wants. And conversely it works
in the other direction. When you install browser, your
browser generates and a public and private key pair,
so that’s– the traffic can work on the opposite direction
as well if necessary. So what’s the take away here? We did all that in
the CS department but we didn’t pay
someone else to certify that we are Harvard
University’s CS department. So the way as SSL works
on a higher level is that there is this chain
of trust that humans in the world have tried to build
up whereby there’s big companies like Verisign is one of them. GoDaddy is another and maybe
ever Namecheap does this. Even more cheaply than others, whereby you have these fairly
big entities on the world who charge you money
to then stamp so to speak your
certificate as valid. What does mean? They digitally sign it. So there’s actually some
interesting mathematics there that are involved but in the
end of the day, it’s in part of marketing thing,
whereby we the whole world of internet users are
trusting that if Verisign says that this SSL certificate
belongs to cs.harvard.edu. If I trust Verisign, I
should trust this website. Now how does Verisign
do the authorization? Well, some of these
registrars or these sellers of SLL certificates, they’ll go to a reasonable lengths
to make sure. They’ll call you on the phone, they’ll check some
business records. That’s what you get if
they’re really being diligent. But the reality is all they
do is send an email typically to whoever is on file as the
owner of the domain name, and in this case it’s
Drew Faust or someone like that for harvard.edu. And that person has to say,
“Yes, I own this domain and I approve this digital
signing of this certificate.” And then, you get back your
digitally signed certificate. And what you do as the system
administrator is you install that digitally signed
certificate which frankly is a big
number supplemented by another big number
and you install it on your web server using
the syntax that we just saw and we’ll see again
in just a moment. So how do you get
this certificate? Well, you can go to someone like
Verisign– and let’s do that. Verisign.com and here we have–
let’s see lots of products. So, oh, here we go. Buy SSL certificates and OK. You know it’s going
to be expensive when they don’t tell you the
price right away on the page, so let’s compare all
SLL certificates. OK. So what do we get? Let’s see, let’s
just spoil the– OK. Here we go. OK, they’re still
not– oh there we go. OK. So here’s what an SSL
certificate apparently cost if you go through Verisign. And mind you, it’s
just for one year. So you’re essentially renting
their approval for a year. What you get now is what here? Different encryption strengths. So if you’re familiar with
cryptography, the more bits in the cipher in the encryption
algorithm, the more secure in theory the transmission is. Extended validation, not
quite sure what this means, probably has to do something
like the duration of it. The warranty, I’ve never
really understood, you know, you’re going to pay $400 and somehow they’re
warranting your website for $1.5 million dollars. I assume the fine prints
said something like, “If the cryptography we use
is broken, fundamentally, we will pay out this amount.” I’m just making that up. But the reality is this is
pretty meaningless, all of this. And the fact that
you get the right to put Norton Secured Sealed
on your website is atrocious. Because anyone can put
an image tag on a website that says something like that. So a lot of these
realize is trying to create an industry
around, sending a message of security to end users. But seeing this should never
mean anything to anyone. It just means that
someone knows how to embed an image on a website. And the take away here too is that using Verisign isn’t
necessarily all that compelling. If we instead go to GoDaddy.com. GoDaddy.com which again
tries to sell you everything in the kitchen sink when
you visit their website, at least is more
reasonable when it comes to SSL certificates
whereby you can get away with $69.99 a year
or the premium SSL. And in this case premium
SSL, which is a feature a lot of these SSL providers
have tried to market in recent years does really
one fundamental difference. What does it mean when you visit
a website and the address bar, it not only says HTTPS
but it also turns green and says the companies
name in that address bar. What does it mean?>>It’s supposed to mean
this side is really secure and you really trust it.>>Right. But in reality what
does it effectively mean based on this–>>They paid a hundred
dollars [inaudible].>>Exactly, they paid a
hundred dollars instead of $70 to get that right. Now before we just said these
sentences, how many of you knew that a green address
bar meant something fundamentally different? OK. So– OK. Even eh, like– so
there’s the question. Is it really worth $30 to
convince no one in this room that you’re site is more secure? So I’m being a little
pessimistic with all of these. But frankly I do think
this is a bit of scam. That we’ve built up
this whole industry, that in theory is actually
is a wonderful idea. These chains of trust whereby if
you trust someone authoritative, like Verisign or the like, you can then trust
anyone they trust. But the reality is, it’s so easy to get SSL certificate
these days. And even until recently most
browsers did not put this crazy sounding message in
front of the user. You might see a little broken
link or a broken padlock icon but they didn’t really
raise the bar. One thing Google has
started doing is putting up a site like this. But I dare say, and this
is a made up statistic, 9 times out of 10, when
you see this message, it’s just because
someone has let– hasn’t paid for their
SSL certificate for the year or it has lapsed. I do this all the time, once a year our website start
saying this because I forgot to pay to bill for
the SSL certificate. But fundamentally, it’s a
wonderful idea because it means that you might be
visiting a site that is not who they
claim to be. Because rather, you’re
the victim of what might be called a man in the middle attack
whereby someone has gotten into the middle of
your DNS traffic and even though you think
your visiting cs.harvard.edu, some bad guys sitting in
Starbucks has actually lead you to his website instead and is
trying to trick you into typing in your user name and
password at the like. So again the mathematics, the
technology itself is wonderful but the fact that there is this
market that are paying hundreds of dollars versus tens of
dollars is a bit unfortunate that that’s where we’re at. Yeah.>>This message will only
appear if port 443 is active in SSL is being offered.>>Enabled. Exactly.>>Otherwise it will not.>>Correct. If the web server itself is
not configured to listen, so to speak, on port
443, then this– you will just get a dead end and you will get a generic
browser message saying “server not found” or
something to that effect. So you must per the
configuration we started glancing at. At least have your website
configured to listen on both of those TCP ports. Recall our discussion
of ports on Monday. We can do a little
introspection here. If I click the X up
here and then zoom in. Server’s certificate
does not match the URL, server certificate has expired, server certificate
is not trusted. So, we’re really not
doing so well here. So let’s click on certificate
information just to see what– oh, but the irony is– but we
have a very secure connection to whoever the hell
this is on the internet. So, let’s click certificate
information and we’ll get a little
more detail. So looks like this
certificate expired in May. So I’m guilty of the same,
so I can’t really poke fun of them for doing this. But if we click details
and scroll down, we see that the certificate
they’re actually using for cs.harvard.edu should
actually be eecs.harvard.edu. That’s Electrical Engineering
in Computer Science. So there is a– unfortunately, I’ve just revealed who’s
responsible for this certificate but he’s no longer
here, so it is OK. But what the take away here is
that there’s a few solutions. Either one, you pay the
bill and then at least one of those messages goes away. And it’s not just a
matter of paying the bill, you have to download an
updated certificate to install in your web server with an
updated date for expiration. But more than that, we also
have to fix the domain name. And so you have a
few options here. You either one, buy a
separate SSL certificate for cs.harvard.edu in
addition to eecs.harbor.edu, or you can buy what’s called
the wildcard certificate. And for instance the course
CS75, we have this ourselves. It’s unfortunately like $199
a year, but what that means for your money, is that you can
protect and avoid these kinds of warnings for *.cs75.net. Any subdomains you want and we
happen to use things like mail and others for back
and technical reasons. So for us that actually
tends to make sense. So there’s a few solutions here. And I should say too, one
of the other reasonably– compelling reason to pay
more money to a bigger fish than someone like
GoDaddy or Namecheap for SSL certificates is that,
as part of this chain of trust, the various browser
manufacturers Microsoft, Google, Apple and so forth, they
ship their browsers, Safari, IE and so forth, with certain
certificate authorities’ own certificate installed. So in other words it’s up to
those big companies of browsers to decide who– which certificate
authority should you trust. And some of those
vendors, Microsoft, they might have a list of certificate authorities
who’s trust this long, Google’s might be this long,
really depends on the company. So if you go some
fly-by-night operation or you yourself digitally
signed your own certificate, which is mathematically
possible, you– if you are not trusted
or that fly-by-night as a SSL company is not
trusted by Microsoft or Google, you’re going to get
this kind of warning. So one of the things
you’re paying– and if you frankly are Fortune
500 Company and the difference between $300 or $1000 is not
such a big deal to make sure that more of your costumers
reach your website correctly, it might be worth spending
more money because it could be that someone has got the latest
version of Android and they’re– for whatever reason it did not
ship with the right certificates or someone’s using version
1.0 of Netscape or something like that, and so certificates
aren’t trusted inside of that. So again, you’re paying
to minimize the risk of users running into this
kind of unrecognized message but that’s orthogonal
to the expiration which is just a matter
of we left the bill laps. Any questions? No? All right. So how do you actually
configure this? Well, when you create your
certificate, running a command on the computer, you end up
with two files, one is a key and one is a certificate. They key– rather
one is a private key, one is a public key. This line here, SSL certificate
key file, this is literally where our private key
can found on the server. For security reasons, I’ve
faked it as path to cs75.key but it’s somewhere
on the hard drive. And I should make it clear,
it is not in the same location as your HTML files and GIFs
because that would be stupid if you– anyone could
just download it. So it’s somewhere else. The certificate key–
a SSL certificate file, this is what you’re paying for. You upload you public key
to GoDaddy or Verisign, they then send you
back via email or a download a digitally signed
copy which has your big number and essentially very big number. And then you install that here. And then lastly, this
chain file just has to do with some registrar, some
SSL providers where by just in case their– one
of the certificate– authority certificates
didn’t ship with the browser, this chain certificate
essentially says we trust this person so it’s OK if your
certificate is assigned by them. So I have glossed over some
of the technical detail, and it turns out, is maybe
nice the theories this is. SSL itself is still
completely broken, like it can be circumvented. And I’ll actually try to dig
up an article and I’ll post it on the lecture’s
page after tonight. If you’re curious to see
an interesting presentation on the various ways
in which you can– for word SSL and trick users into thinking its secure
when it really isn’t. So nice story but the whole
world is broken anyway. Any questions about SSL? There’s one corner case
that you need to be mindful of when setting up your
own website, a running SSL in your website requires that you have a unique–
fill in the blank. Requires that your
website have a unique IP. And this is one of the genuine
gotchas [phonetic] with SSL. You have a sort of
Catch-22 with SSL. Because SSL is about
encrypting information, what’s get encrypted? Really everything in the
request and the response. So everything inside of the virtual envelope
is encrypted what are some of the things inside
the virtual envelope? Well, the get line and also–>>The specific server you
would be on a virtual–>>Exactly. The host tether which tells the
server which Vhost this is for. But the problem is,
as we’re looking at the configuration here. Every Vhost can obviously
have its own SSL certificate because it might be
food.com, bar.com. This could be unrelated
entities. This is a snippet of a
shared web host’s web server configuration. So, if you’re getting encrypted
request but the only way to figure out how to–
who the request is for, is to decrypt the request. But to decrypted request,
you have to know who it is for because the SSL
certificate key– the private key you should
use is tied to that Vhost. You again have this Catch-22. You can only figure out who it’s
for by knowing who it’s for. And so, there’s, you know,
there’s– in theory work around, you could try all possible
private keys you have on the system decrypting but that’s not necessarily
deterministic and it is also a little
hackish [phonetic] especially if you have hundreds
of Vhost on the server. So the de facto result is
that you just can’t do it. But if you give every
Vhost a unique IP address and then associate
effectively the certificate with that IP address,
then you’re safe. Because then, you can just
assume that if it comes in on IP address w.x.y.z it must
be using this SSL certificate. And there is one corner case. If you have a wildcard
certificate like we, the course do, thankfully
with the wildcard, we don’t need a unique
IP address for all of our subdomains, FTP,
mail, web and so forth. Because, if they all
come to the same server, you can use the same
wildcard certificate to decrypt all of that traffic. So in short, when you sign up
for a web host, if you want SSL, which frankly this days, it’s
just a good thing to have, good practice to get into, it’s probably worth
paying a few dollars more to get a unique IP address. Because otherwise,
your users will get that very scary, red message. And Google makes it, Chrome
makes it easy to click through. Firefox, you literally have to
click like five buttons in order to get pass the warnings. It’s atrocious. No normal user will
ever figure it out. So paying for an SSL
certificate is sort of a necessary evil these days. The end result is
great, cryptography. But a bunch of hoops you
have to try– jump through. All right, any questions? Yeah.>>When I post a post-it paper
that you mentioned [inaudible].>>By morning. I’ll dig up the URL
and then I will post it on the lecture’s
page of the website. So that– if of interest,
you can check that out. Let me just pull
up our slides here. And go to– so what about this? This is among the more cryptic
pieces of syntax that’s useful to know or at least
get comfortable with or get comfortable
copying and pasting. Because with Apache
you can actually start to do fairly powerful things. And this is perhaps
one of the most common. This is using enough
feature of Apache and other web servers have
very similar functionality, though they might call
it something different, called the URL rewriting. So mod rewrite just
means module rewrite, this is an optional feature. You can enable an
Apache web server that lets you rewrite URLs. Now even if you’ve never
seen this syntax before, what do you think
these three lines of monospaced text are doing? Yeah.>>Compensating for
omissions and misspellings.>>Compensating for
omissions and misspellings? Sort of. That’s actually
a good thought. The only catch there is that
if the user does mistype the address, it won’t necessarily
work unless DNS is configured to at least deliver the
user to this end point. So in other words, if they
accidentally type wwww.cs75.net, that will be a dead end unless
we in DNS have allowed to work with an aid record for
instance or wild card record which is also possible. What else might this
be doing though? That’s on the right track though and this is a very concrete
case that we’re solving.>>Maybe it has something to do
with the checking if it’s HTTPS.>>OK, good. So is it checking
whether it’s HTTPS. So it’s technically not, though
it’s very close to doing that. We could tweak it
in a certain way. Yeah?>>Is this kind like
a re-direct something?>>It is a redirect and
what’s it redirecting from and to do you think?>>From the top unto
the bottom one.>>From the top unto
the bottom one, sort of. So that’s actually pretty close. So let’s start teasing
this apart. So the very first line
does what it says. Turns this so-called rewrite
engine on, if without that, also this is a common
thing I often forget about, nothing is going to work
unless you explicitly turn the engine on. So first line does that. Second line is a condition. So you can think of this is
a certain of a cryptic way of implementing an
if-else type condition. So if the HTTP host
variables– so what is this? Anything with the
present sign curly brace and then a capital
phrase like that, it’s what’s called an
environment variable on the web server. There’s a whole bunch of
variables that are set to sort of automatically for
you when a user visits, among them is HTTP host. And that is a variable that
specifies what is the IP address or literally the word, the
host name or domain name that the user visited. It’s equivalent to the
host line if you will from the HTTP request. So bang here is part now
of a regular expression. So if unfamiliar, regular
expression is a pattern that you’re trying to match. Bang is the opposite of true, so it means if the
HTB host is not going to match the following,
don’t proceed any further. So what are trying to match? The caret symbol means
what in a rejects? Reject is fancy way of
saying regular expression.>>Anything?>>Not anything that
would be dot.>>Not any case.>>Not any case. Caret symbol, anyone else?>>Begins.>>Begins, perfect. So caret symbol means
the beginning of the variables value
must start with www this is to avoid accidental
substring matching where you’re matching part of the did domain name
but not all of it. So this means you must
start matching from www. In other words the first letter in the host name
must actually be www. It can’t be xyz, www. So www, I have a backslash dot. Based on what I just said about dot significance
what is backslash dot? It’s an escape character, so
it means literally a period. If you just say period that
means any character can be here, backslash dot is only
a dot can be here. Cs75.net/.net means it must
match some literally a .NET and then this NC is fairly–
or arcane, just means no case. It’s a case insensitive. It doesn’t matter if the user
have the caps lock key on, this will still match
if the word is correct. So if the HTTP host is not equal to literally www.cs75.net
proceed to the following line. What does the following
line say? This is rewrite rule. So this is the– if
you have an if-else, this is the if-then
part of the expression. So if, then do this. So this thing here, let’s come
back to and focus on this. I am going to rewrite the
user, rather, redirect the user to HTTPS://www.cs75.net/$1. What is $1 may be referred
to for those familiar with rejectses [phonetic]?>>I think it’s whatever
the user type in after .net?>>Exactly.>>So that you wouldn’t
have like a [inaudible].>>Exactly. So let’s go back to this. What is this doing? Parentheses, in the context
of regular expressions, generally mean capturing
parenthesis. So this cryptic sequence of
symbols here means dot start. So dot is any character,
star means zero or more of the proceeding thing. So this means zero or more
characters capture them. Where you’re capturing
them from? Exactly what you said,
anything after the slash that the user typed in is
captured by these parentheses and by convention is stored
in a variable called $1. If I had a second pair
of parentheses over here for whatever reason,
then I would have access to $1 and $2 and $3 and $4. So it’s a generic
way of not knowing in advance how many
parentheses you might have, but you can at least express
yourself after the fact. So this just ensures that if
the user visits something/abc, I will not be redirecting
the user to www.cs75.net. That’s it. I will also have the courtesy
of sending them to /abc. And this is infuriating how
few websites actually do this, especially in mobile phones. If you’re in the habit of
reading of news or what-not on your phone, this is a
detail that drives me nuts. I’ll go to like Google News,
which has links to all sorts of websites, I’ll click through,
and for whatever stupid reason, the website will decide,
“Oh, you probably want– you came to us from Google News,
but we want to show you our– the mobile version of our
website, so let us send you to m.news.com” or
whatever it is, completely forgetting what the
URL was that you we were at. So the end result is you
can’t view the article that you clicked on. How do you fix this? Simple as something like this. Now, if they’re not
using a patchy, it’s going to be a little
different, but it affix, it’s fundamentally that simple to remember what
the user typed in. So again, in terms
of user experience, in terms of running your own
websites, super simple thing to do and certainly to
you user’s advantage, because if you’re like me,
you just leave that news site and never come back
because it just– it was annoying to
visit in that case. All right. And how about a couple
more technical details? R equals 301. Anyone want to guess
what’s that referring to?>>Isn’t that the redirect one?>>Yeah. The redirect’s
status quo that we talked about on Monday, 301
means, what specifically? Moved–>>Permanently.>>– permanently. So this is in contrast
with 302, which happens to be moved temporarily. Who cares? Like why are these two separate
codes do you think whose functionality is
essentially the same.>>If it’s moved permanently
or computers don’t save that.>>Good. If it’s a 301 and
thus permanent, the browser, if it’s smart, it will
cache that response and the next time
you, the human, try to visit the same
page, you’re just going to be automatically redirected without wasting the server’s
time asking the same question. Whereas 302 means
it’s temporary, you probably should
check back with me. So upside is, you
save a little time. The user gets a response
a little bit faster. Downside though is what? What’s the downside
of 301 do you think? Again, think– start
thinking about corner cases and problems you
might be creating by trying to be helpful. For the– what’s that?>>In case it will
revert that back.>>In case it reverts back. Suppose that you just decide
to reconfigure your server or you change the name
of it or whatever. You know, it’s not
something you do commonly, but the day you do it, are you
going to be tricking your users into visiting a dead end? And so you have to be mindful, especially if you’re the
person doing the web server configuration, not the
development of the website, you know, maybe we
should make sure both of these continue working for
some number of days or weeks so that anyone in the world who had cached this response
finally reboots their computer or quits their browsers. So these are the kinds of
corner cases to be mindful of especially when you care
ever so much about uptime and making sure your
users don’t hit dead ends. L, probably won’t guess
this, this means last. This just means if
you have a whole bunch of these rewrite conditions
and rules in the same file, this is just one of
saying, “That’s it. Don’t bother processing
anything else in the file. We want this redirect
to kick in first.” So find fault with this. I’m kind of looking
at my own align here, and there’s technically a bug
even though it’s not likely ever to be encountered. How could have I been
a little more rigorous with defining this do you think? Specifically, I’m thinking
about my pattern matching. It’s not quite as
robust or correct as I think it probably
should be, if you want to be
really nit-picky. Yeah. With–>>I don’t know that would help if you could add
HTTP in front of www.>>HTTP in front of www. Oh, good– so good
thoughts, to put HTTP, it would actually break them. Because HTTP host, the
variable is by definition, and you can only know this
by reading the manual, does not contain the protocol,
it only contains the host. How about if I point
at the end here? What could I be doing better? Yeah.>>The slash at the end.>>So good thought too. Slash also though doesn’t belong
because it’s part of the path and host is literally
just the host. But it is something there. If you are familiar with
the regular expressions, it could be–>>Sets, I think it corresponds, gives [inaudible]
toward the end.>>Exactly, and for
some crazy reason, you would like to think that– or it’d be a nice world if the karat symbol represented
both the beginning and the end of a string, but the
world chose dollar sign. So, I should really put a
dollar sign after the T here, because that would mean, you have to literally
match NET and that’s it. Now, why is that relevant? Well, it’s probably not that
relevant because I do not know of any top level domains
that exist today that are– that started with NET and
have more letters after them. But there’s this
trendency [phonetic] now where the world is
creating much bigger names. And in fact if you
pay like $100,000, you can get .google or .apple. But someone could get
.networksolutions. And as soon as we
do that, then again, the pattern match
is not quite right. But again, it has no
real material effect because if DNS weren’t set up,
the user would never reach me. But again, just a little thing
to be mindful of that is not as precise as we could be. All right. So, what is this– OK,
that was really technical. Who cares, what is
this really doing? Why would the user
ever reached my website and not already be
at www.cs75.net? What is the point of these
three lines from a user’s– or really just big picture here? How else could you
visit www.cs75. net? Even today with
your laptops? Yeah.>>Use FTP.>>OK, FTP but then
this won’t even kick in because this is just a web,
just a port 80, just HTTP. How else could you visit
the course’s on page? Yeah.>>There could be error in one of the DNS server
that [inaudible]–>>OK.>>Someone to your ID and
[inaudible] who doesn’t intend to go your actual [inaudible].>>Oh, so that’s good. So if there’s a DNS error or there’s just some
maliciousness going on, you could be lead to
our website and– right? We did this Monday, what was
this stupid little demo I did on the fly that made a
certain news company look a little silly?>>Change the name CNN.>>Yeah, right? I think I had davidnews.com
all of a sudden and we went there
and we stayed there. And I mentioned at the time
that CNN, if they just put like two lines of configuration
in the file, they could fix this and immediately redirect the
user to protect their branding so that it goes back
to www.cnn.com. This is exactly the fix. Now we’re not doing it because
worried people are going to come up with like fake cs75.com
or stupid stuff like that. But just the simpler, what if they just visit
http://cs75.net, enter. We just decided as a course
that like most websites on the internet, we want to
standardize not on cs75.net, which we want to work but we
want to redirect the users so that they end
up at www.cs75.net. Now why? One of it is
just, you know, branding. If you want to– there’s
something to be said for just at least standardizing
what your URLS look like, whether it has the www or not. But more than that, we
mentioned briefly on Monday and we’ll revisit this in
time, the cookie issue. Whereby, if you do
have a subdomain, you can then isolate cookies
to be part of the www subdomain and they don’t have to be global to your whole domain
name cs75.net. So in another words, all
these lines are doing for us, and these are literally the
lines we use on our website. If I go to http://cs75.net,
enter, where do I end up? Well a couple of places, one, I
ended up at www, just because. But I also end up at the SSL
version also just because. And then this, it’s just
because we’re using MediaWiki, software that automatically
makes the default page called main page for no good reason. So there’s a few
things going on there. So you can infer from this
though, how can you enforce use of SSL on your website? Suppose you’re bank,
suppose your Gmail these days and you want to force
users to stay on HTTPS even if they visit HTTP,
how do you do it? Well, it’s pretty much
the same trick here. But rather than check
the host name which is not the problem
now, you want to check SSL, so what you can really
do in this case, is instead align like this. RewriteCond HTTPS not equal On. So this i a light–
slightly different syntax but this is a different
condition we could use that asks a different question. If the environment variable
called HTTPS is not equal ON, on, that’s the implication? It means it’s off. And so what should you do? Well, the next line is
that same rewrite role, you will redirect the user. So, this is how you enforce SSL. This is one way you can
enforce SSL on a website. Yeah.>>And so this checks for
every page to say somewhat about [inaudible]
.com slash banking–>>Exactly.>>– but still work
[inaudible] send to the HTTPS.>>Exactly. This will work for every page
on the website because we had that additional use of the
capturing parenthesis to ensure that they don’t just go back
to the generic home page, which is just annoying at
least in my experience, but rather they go to slash
whatever they were at. And this gets installed
to clear either in that file called httpd.conf. But as you also see, there are
per directory file configuration files that Apache supports
called HT access files, literally just a text file
called period H-T-A-C-C-E-S-S. And that syntax looks
very similar to this. But, you can’t necessarily do
everything in an HT access file that you can in the main
server configuration. In depends if people like
us, the system administrators of a website let you put
certain commands in a directory. So, you can use .htaccess files
the password protect directories for instance to change mind-type so to speak some
fairly arcane details. But this is one of
the most compelling. And there’s actually
another one. Facebook, if you’re a user. Almost, many of the URLs end in what file extension
as we said on Monday? So, .php just because, like,
for historical reasons, they still use PHP for a lot
of their front end stuff, but there’s no technical reason to expose what language
you’re using on your server. In fact, it feels like it’s just
a waste of four bytes, right? Why bother sending .php when
it’s strictly not necessary. And frankly it’s very
web 2.0 these days to have cleaner URLs,
prettier URLs. They just don’t have craft
like file extensions. These httpd.conf and also HT
access files can also be used to let you avoid ever
putting .php in your URLs. Your files on your hard drive
can still be called hello.php but the user could just visit
/hello and using mod rewrite, you can essentially
tell the web server if the file /hello does not
exist, look for /hello.php. And if that exists,
serve that up instead. Yeah.>>No, nothing.>>OK. So, lot of power. I will say too, this
is one of the things that frustrates some people
including myself the most because the slightest
syntax error anywhere, if you get the permissions
of the file wrong, your whole website can break. So, it’s a lot of power and a
lot of trial and error and a lot of googling sometimes
to solve these problems. All right. Any questions? No? All right. So, where can use
stuff like this? Well, next week, when we start
talking about the first project, we’ll introduce this
appliance, this virtual machine in which you have your own
version of Apache running. But– And certainly after the
course or even during the course if you want to experiment
with other approaches, it’s actually very easy to get
LAMP onto your own computer. You don’t need to pay for
a web post, you don’t need to set up Linux computer. You can do it on
your own Mac or PC. In fact Mac OS these days comes
with Apache, comes with PHP, comes with Python, comes
with Perl, a lot of support for web programming
related stuff built in even though you sometimes
have to run some commands to actually enable it. Your laptop is not a web server
by default even though Apache is in there if it’s a Mac. Windows tends not to come with as much software along
these lines but either way, there are some packages,
this is one of them XAMPP that makes it pretty easy
to make a web environment on your own computer
not necessarily for serving content
to real users. We had that discussion on Monday
that, you know, getting users from the outside world to your
home with your cable model and all that, it’s not trivial and your ISP might not
even let you or like, but for development purposes. You don’t need of
actual web server per se. You don’t need to pay anyone
to start doing web development. You can do it on your
own local hard drive even if it’s not static content, HTML
files but it’s actually dynamic with something like PHP. So, XAMPP is just the
product name for free software that includes support for Linux,
Mac OS, Solaris, and Windows. So, it doesn’t matter what
OS you have and it installs for you Apache, MySQL,
PHP and also even Perl which is the other
P in LAMP sometimes. Or actually no, that’s
the P in XAMPP in LAMP. So, what is this mean? It means you go to their website
which is, you just google XAMPP to pull up their page. You can install the software. And ideally, you then have
some nice documentation locally and your own database,
your own web server, your own installation of PHP so you can do all your
development locally, which is nice because
it’s super fast. And it means you can work
in a cafe or what not without even having
internet access. There are some corner cases. XAMPP hasn’t been the easiest
historically to set up. Sometimes it does not quite
work on everyone’s computers, which is why we actually
transition to the VM approach where we can guarantee that everyone’s environment is
the same and works correctly. But certainly moving forward
when you no longer want to rely on course provided software
realized this is a nice local development option as well. And similarly that you configure
most anything you would like. Any questions then? All right. It feels like a good point
to take a five-minute break and when we return, why
do not we dive into PHP and actually finishing
the back end of something like google.com. So let’s take five. All right, we’re back. So just a couple of
details, you should have or should soon receive
an email invitation from the course’s
discussion tool. We’ll post a link
and announcement on the course’s home page to
explain to where to go and how to go if you do not receive such
a link but it would have gone to this e-mail to
the e-mail address with which you registered
for the course, FYI, in case that’s not in address
you use quite commonly. But again, more details on the
course’s home page by tomorrow. Let me introduce another
of the course’s TF’s alone who if you would not mind coming
up close to my microphone, would like to say
hello to the class.>>Hi everyone. My name is Allan. You can call me Allen. It’s– These are for you and
I’m here to take your questions and help you out
with anything you–>>OK. Excellent. And Peter how we met on Monday
will be back shortly this evening and once lecture
wraps, we’ll dive into section. Which again will
be an opportunity for slightly more intimate
Q&A to go over concepts that might be a little
more abstract and particularly once the
first project is released which will be on July 9th is
when the first one will go out, it will be an opportunity
particularly to focus on the project and get
direction and guidance and design tips on them. So, more on that to come. All right. So, time for some PHP. So recall that we
talked briefly about some of the basic UI mechanisms
that browsers allow. Radio buttons, text
fields, text areas, checked boxes and the like. And these really are going to be the fundamental
mechanisms whereby we go from static web sites with just
HTML and CSS to dynamic websites with some kind of
server side intelligence that does something
based on user input to produce dynamic user output. So these days, thankfully the
web is getting more interesting and sexier than some of these
more old school UI mechanisms. But even the fanciest
of autocomplete widgets that you see, and
calendaring things where you can choose
calendar dates and the like are still built
on top of these but all the more
stylized these days with JavaScript and with CSS. And so we’ll look at
some of that fancier use of input mechanisms in a
few weeks when get to AJAX and JavaScript itself. So here is a representative
snippet of Google. Recall that on Monday, we started implementing the same
interface even though it was all black and white in text. But we did have a text box and
we did have a couple of buttons and when you click that submit
button, you actually ended up initially nowhere, right? We ended up on my same file,
which is not dynamic at all. But then I went in and
change the action attributes so that we actually
submit it to Google, so technically we
cut some corners and didn’t implement a
dynamic website ourselves but we did look at the basic
mechanism whereby form input becomes get request or an
alternative to GET is POST. For those familiar, what are– what is one or more of the
fundamental differences between using GET versus POST? Yeah.>>Oh, GET is actually going
to include what you entered in the form of the URL.>>OK.>>And POST is just
not good into that.>>OK. Excellent. So GET request will have state
change in the URL itself. And that’s exactly what we
saw on Monday with the Google where we had question
mark, what came next? Question mark–>>Q.>>– Q equals whatever–
harvard whatever I tap in– type in or the user types in. So POST does not do that. So, that’s a nice
distinction, but what’s– what are some more distinctions or what would motivate
you using GET versus POST if functionally they
could be the same. You could still get
search results for instance even though Google as an aside does
not support POST. What’s the– What else
should drive you to GET versus POST or vice versa? Yeah.>>Well if you’re on the
site that tends to deal with uploads here then
why don’t you suppose with it had special ways
to deal with large files–>>Excellent. Yeah. So GET requests are
not so great for things like file uploads,
photo uploads, right? If anything conceptually,
this just make no sense, how do you upload
a file in a URL. Now technically, you can encode
it using something called base64 encoding where you convert the
binary image of zeros and ones to As and Bs and Cs and
1, 2, 3s and so forth. But the other gotcha is that
most browsers have a length on the maximum length
of the URL. Unfortunately, this
is not standardized and it’s barely even documented. But the rough rule of thumb is if your URL is several
hundred characters long, it’s probably too long. And a reasonable cut off is
something like 1024 characters. You’re definitely
pushing your limits. However, it’s completely
browser dependent. Some browser support
8000-character URLs, 1000-character URLs but the
take away is that, really, you have to deal with
lowest common denominator, whatever that is. And so anytime your URL start
getting long, it’s probably time to rethink your design and start
using something called AJAX, which again we’ll
look at or using POST. POST does not have a limit. In fact, one of the upsides of
POST is that it in HTTP headers, will tell the server how big
the file or parameters are that are being posted, so to
speak, so that the browsers know when it’s received everything. So the browser figures out. OK, this is like a 5
megabyte photo, so I’m going to tell the web server through
the headers expect 5 megabytes And then with the– server gets
is below all the headers is all the crazy zeros and ones or
equivalently A, B, Cs, 1, 2, 3s but it knows where they stop. So it knows when it’s
received the whole photo. Suppose there’s grade for that, what else is POST
compelling for? What other used cases
besides file uploads? And put on your paranoid hat. If you’re using GET,
what are you at risk for? Yeah.>>Somebody is actually is
snipping what the user sends.>>Perfect. So if– and what might the user
send that could be sensitive?>>I mean, you wouldn’t really
send the password or a–>>Good.>>– username with
the GET list.>>OK, good. So sending user names,
passwords, credit card numbers, anything that’s arguably
sensible probably should not be submitted by a GET
because it ends up in the URL, and
why is that bad? Well fundamentally, it’s still
being sent to the web server and if it’s over SSL,
it’s at least encrypted. However, it’s not encrypted
from you family members or your friends or your
roommates who might sit down at your same computer. And you know what you can do
with most browsers today browse through the history, right? And if it’s in the URL, that
means it’s going to get logged and it’s going to end
up in autocomplete until the cache is
manually cleared. It’s just too easy then
for someone to find it. And it’s also going to
end up somewhere else. Even though it might be
transmitted over SSL, so random people on the internet
or Starbucks can’t see it, once the server gets the
request, many servers as we– you can maybe infer from the
httpd.conf configuration file are there have logs. And what tends to
get logged in logs? Not POST, because they could be
huge, 5 megabytes and what not. But typically what
are logged in logs? GET requests, including
the URL that was visited. Which means any website
that’s ever used GET for password authentication
or credit card submission– which would be rare but
could happen especially if the person does not
know what they are doing– it’s ending up in the logs. Which means some random person’s
unencrypted log files has all of your sensitive information. So in short, anytime
something is big or anytime something
is sensitive, GET is not the way to go. However, that would
seem that’s just fine, just use POST all
the time, right? Just avoid all these
issues together. I do not have to remember
what the difference is. But what’s the downside
of using POST? Based on your own, maybe even
non-technical user experience, what’s the downside? Yeah.>>Can copy-paste the URL–>>Yeah.>>– available [inaudible].>>Perfect. You can copy-paste the URL. Completely reasonably
concern especially from the user experience
user perspective. Because, very reasonable
for someone who want to copy the URL say “Oh,
check out this book” or “check out this link”, whatever
it is you’re looking at. And it’s actually pretty
infuriating when the person who receives the email says
“Oh, I only see their home page” because they just
redirected them, because of a number of things. One, the state that was
necessary to remember that book, the ISPN or whatever
was not in the URL because they were using POST, or
it’s even worse, some websites– even I think the
Harvard Coop does this. When you navigate
around their website, the URL similarly doesn’t change because the information being
stored is best that I can tell in their session cookies. Something we’ll talk about on
next week or later tonight, whereby it’s only
remembered by the server. Thanks to a cookie
where you are, which means even you can’t
bookmark your own pages that are of interest to you. So in short, horrible design, and some websites are
very much guilty of this. So how many time you want the
user to be able to save state in a URL rather in an email or
just with the back button too. It’s helpful to make sure
it is in the URL itself. Of course there’s
another reason, this is getting better these
days with modern browsers, but typically with POSTs
if you click reload, you’ll often get prompted
and the website will say or the browser will say
“Are you sure you want to resubmit this form?” So there’s also issues of
resubmitting forms and what not that are typically bad. And so one of the things that’s
got in more common these days to avoid people accidentally
checking out twice or buying things twice on
an online store, you know, having that message
say, wait a minute, are you sure you want
to submit this form? What you can often do is– once the user does a POST because they have
uploaded something or they bought something, what you then do is immediately
redirect them with a 301 or a 302 which only use GETs. You cannot use redirects to
repost somewhere else, FYI. Then the user, if they
accidentally hit reload or hit back in their browser,
they’re only going to get back and forth between a GET
requests not a POST. So you can also discourage
the user from submitting a form again. And there’s other protections
you can put in place, but that’s another reason, too, if you want to avoid
resubmission of forms. Sending a GET via
redirect can be one level of protection against that. All right. So, here we go with PHP. This is going to be a
fairly rapid tour of this, because again the course
does assume nontrivial prior programming experience. So this is another
detail to where if you find yourself what
is programming, again, we should have a
conversation right after class or over the LAN or with Peter
if you’re more comfortable about what your own background
is because we’re going to start talking about things
like arrays and hash tables and associative arrays. And if this is all new to
you, it’s definitely going to be a bigger challenge but we’ve certainly had
students do it before, so use your judgment
along the way. So, one of the best things about PHP is its
documentation to be honest. It’s actually fairly
user-friendly, very nice to navigate
and so let me just follow up an arbitrary example,
kind of a boring function but one that’s commonly used. If I Google PHP date function, I can go up to a representative
documentation page here. And just to give you quick tour of something you’ll see
much more when you dive into the course’s projects,
along the left-hand side of the website is a list
of all of the related or available functions, PHP
is actually not this slow of a language usually. Let’s try reloading. OK. She didn’t– oh, so, actually there’s an
interesting lesson there. So actually, let’s try
this rather than just give up on this altogether. Let me see if we can–
oh, damn network. So I was going to pull
up Chrome’s network tab, we could look at exactly what
was hanging there, but it seems to have resolved itself. So, a quick tour then
of the page here. So on the left-hand side is
all of the related functions, just FYI, a little overwhelming
at first but the reality is for this class and really
in general, you’re not going to need to know every one of
these functions, just looking it up on demand is useful
enough typically. On the right-hand side is the
canonical layout of a function. So, it tells you
first what version of PHP supports this function. This is actually important not
so much when you control your on own server because either
you’ll be running yourself if it’s your own server, pretty
recent version of PHP, 5.1, 5.2, 5.3, 5.4, or fairly recent
incarnations but 5.4 the latest. But there are some
web hosting companies that might still be running
PHP 4, not terribly common but you will lose a huge amount of functionality including
object-oriented programming support, if you are something
as old as PHP 4, just FYI. So, what does this function do? It formats a local date
and time which means if I give it a string like H
colon M for hours colon minutes or something like that. It should return to me a
formatted string like 3:00 p.m. or something like that. So, that’s what it does, it
gives me the current time or it converts a numeric
time stamp to a date. So, here is how you
parse the signatures. This means it returns a string. This means it takes a string as its first argument,
which is the format. Any variable in PHP as we’ll
see quite a bit is it starts with a dollar sign. Square brackets in documentation
means it’s optional, which means if you want to
override the current date and time you can pass a
new numeric time stamp. For those unfamiliar,
a time stamp in many programming languages
is the number of seconds since January 1st 1970,
the so-called epoch. And then you can override
the default behavior. Useful if you’ve stored time
stamps in like a database and you want to display them in some human friendly
way after the facts. All right. This returns a string
format [inaudible] to given format string
dot, dot, dot. Here’s just some more
detail on the format, so the format parameter can
apparently be a quoted string containing all of these
various placeholders, D for day, J for day of the month without
leading zeroes and so forth. Memorizing this is not a
good use of any human’s time, but looking it up is reasonable. Let’s just scroll
down, past all of that. Timestamp does what I promised. Return value returns a
formatted date string. If you do something wrong,
it goes on to explain that there’s an error. And then let me scroll
down here. The examples, frankly, is where my eye is typically
drawn most immediately. So, if I take a look here
this gives me some little cheat sheets. If I want to print
out echo date “l” for whatever reason L
denotes the day of the week. If it is Monday, it
would print Monday. Today, it would print
Wednesday dynamically. Here’s some more
complicated string that they claim will print out
this and so on and so forth. This is the kind of thing
that this function does. But the takeaway is,
for our purposes now, is just PHP’s documentation is
always structured in this way. Summary of the function up top,
description of the parameters, some version notes
in case you need to be aware what
version of PHP you have. Example one, example
two, example three. And then at the bottom,
there’s generally some pretty intelligent discussion on the
comment threads that are there. It’s not really crazy talk. This seemed to moderate
it quite well, so you actually see people
sharing useful code for command, workarounds or common tricks
that someone might want to do related to
the date function. So in short, the documentation
will be your friend and what we will do in
lecture is not to go through mind-numbing tours of
the various functions that exist and so forth, but focus much
more so on the concepts, on the syntax, and on
the overall framework so that you know as you
dive in to how do I do this, how do I do this, where
it fits in big picture in terms of a project. So PHP is an interpreted
language. What does it mean for a
language to be interpreted? Or what is the opposite of an interpreted language
even though they’re not truly literally opposites. Yeah? [ Inaudible Remark ] A compiled language, so a
compiled language is something like C or C++, or language
that has source code written in English-like syntax
but you have to run it through a compiler like GCC
or Visual Studio or the like and it outputs what’s
generally called object code or more specifically zeroes and
ones that are patterned in a way that a CPU like an
Intel CPU understands. An interpreted language
skips that step, essentially, whereby instead you
write the source code and then you pass
your source code through what’s called
an interpreter instead of a compiler and then an
interpreter essentially reads that language that
you’ve written, the source code you’ve written,
top to bottom, left to right, doing line by line exactly
what you tell it to do. So the upside is, there’s
no intermediate step, you don’t have to run the
compiler then run your program. In an interpreted world,
you just run your program through the interpreter
and it’s that. It’s one step instead of two. But what’s the downside
of the fact that it’s interpreting it
line by line as opposed to converting it
to zeroes and ones?>>Performance.>>Performance, typically. So compiled languages
tend to be faster because you’re spending
more time in memory and disk space upfront
to convert source codes to object codes, zeroes and
ones, but once it zeroes and ones, it’s super ready to be
read and understood by the CPU. Whereas an interpreted
language typically needs to be literally interpreted
again and again, and every time I call the
date function, D-A-T-E needs to be parsed or read and
then converted effectively to the underlying functionality. Now, there exists
compilers of sorts for PHP and for other interpreted
languages and what are called
opcode caches. More on this at the end of
this semester when we talk about scalability, which
simply means for now, that smart web servers and interpreters will do
the interpretation once, convert it to some intermediate
format and then save that intermediate format, which in the PHP world is
called opcodes, O-P-C-O-D-E-S. And this just means it will skip
that step the next time around. It’s not quite compiled
but at least it’s better, it’s a closer approximation
to it. Frankly, it’s a nice thing
with interpreted languages because you don’t have to go
through that annoying step of recompiling and recompiling. Every time you make a
change, you can interact with your code a
lot more fluidly. It just saves some steps,
especially for large projects which might have large number
of files and lines of code to actually compile otherwise. So, some upsides
and some downsides. If you’re crazy popular
like someone like Facebook, Facebook actually has a
framework called HipHop. It’s PHP which they released
open source a while back which actually compiles PHP down
to C++ which is then compiled and turned to object codes
to get maximal performance out of the code that they write. And this is motivated
by a number of things, but among the things they
discuss publicly is this way, PHP is fairly omnipresent
and it’s fairly easy language for people to learn
especially coming out of college and the like. So it means they can have their
developers using a language that’s fairly easy to learn,
they probably already know it, and they can then defer
the performance details that are typically associated
with the language to some of their more advances engineers
who can then take PHP code down to something that’s
even more highly performing. So among the options
that exists these days. So a lot of the arguments
you might see on the web about performance of PHP
versus Ruby versus Python versus Java versus this. There are many, many
different technical solutions to the performance question. And a very valid heuristic, I
think, when choosing a language, whether it’s going
to be PHP or another, is what you already know
and what the cost is to you to develop or to
learning something else, what friends know or what
your colleagues know, and also what tools exist to
mitigate, and the prices you pay to use something like
an interpreted language. So suPHP, this is something
that will be installed in the CS50 Appliance. It is installed on some
web host, but not nearly as many as would be good. So suPHP is substitute
user PHP and it exists to solve the following problem. When you have web server, you
have software running on it that listens for connections
on port A and so forth. Years ago, most such servers
ran as a username called root. Root is the administrative
user and running anything as root is generally bad, why? Yes?>>Well, you can, like,
destroy your computer.>>You can destroy
your computer, how? Be more specific.>>Well, you can remove,
like, files that are essential to the operating system.>>Good. So if the root
user has full-fledged access to the system, if you make
a mistake in your code, if it’s a web server, and
you’re running web code, and you make a mistake and you
accidentally delete the wrong directory, that is permanent,
like you can touch anything on the system including
the password file which even though is encrypted
should not generally be shared with the world. So in short, running anything
as root puts you at risk because if what root
is doing is bugging. And odds are you’re human,
you error, you’re going to write buggy code sometimes, that means who’s
running the buggy code, the most important
user on the system which means your entire
machine could be compromised if you screw up. So finally, the world years
ago got into the habit of at least running web
servers in particular as different username. Sometimes “nobody”
literally, the username nobody or dub dub dub or
Apache or HDPD, it doesn’t really
matter what it is, it matters that it’s not root. But some problems arise, especially in this popular
world these days of V-hosting and web post, commercial
web post. Because just think of
this, if you are customer A and there’s a customer B, and
you have someone like DreamHost or the like, you each
have accounts with them and you have your own
usernames and passwords and you have your own home
directory, so to speak, where you can store you code. But the web server runs under
username Apache for instance, Apache again being the
web server software. In terms of permissions,
Apache is not you, obviously, because you are A or you are B,
so you have different usernames. But if Apache is the web
server and the web server needs to obviously be able to see your
files in order to serve them up, what kinds of permission do your
files need if you’re familiar with Linux file permissions
or Windows, really file permissions
in general? Your files have to be what’s
called world readable, typically. You can do more fine grain
permissions, but the reality is on most systems, the
easier approaches, you’re told to chmod your file
644, more on that in the future. But make your files
world readable. Why? Because you don’t really
need the world to read them, you need the web
server to be able to read them including
specially your PHP code, which we’ll about– were
about to start writing. So, what’s the implication,
though? There’s a few things. If your files are world readable so that this middle man Apache
can read them, that’s great. It makes the website work. But it also means that someone
else can read your file, too.>>That would probably
be the other customer.>>The other customer,
customer B, right? Because world readable
is world readable. Now, if your files are being
served up on a web server, that means you can see your
files at URLs like /hello.php. So that means anyone in the internet knows what
your files are called. So, other customer
because he or she can log in to the same server can
just enter your directory, and even though they might
not be able to see all of your files, if they
know what they’re called, they can then definitely
see your files by just using a text editor
or some kind of program that just opens these files. Now, that’s not such a big
deal for JavaScript, for CSS, because frankly, who cares? That stuff is by nature
of JavaScript and CSS, going to be sent to the browser
in the whole world anyway. And you might try to obfuscate,
as we’ll discuss in a few weeks with minification and
hiding things from users. But you can’t really protect
your intellectual property when it comes to JavaScript
and CSS because the browser and the whole world
have to see it. But PHP, you might put
a lot of heart into it and you’ve put a lot of
intellectual property into your PHP code which
is really the secret sauce of your business or whatever. But now, the web
server needs to be able to read it as can any customer. So now, you are at risk of the customer seeing all
the hard work you’ve done. In fact, what might your files
contain out of necessity, if familiar with
databases in the like? Yeah?>>The PHP would need
to contain the name of the database and
the password.>>Exactly. Things like usernames and
passwords for databases, for caching engines, for
Facebook APIs, whatever it is, your PHP code might have some
more insight of it variables that did need to be there but you don’t need the customer
be being able to see it. So in short, running a web
server as Apache is great for security of the whole
site, bad for the security of customers A and B and C who probably don’t
even know each other and certainly shouldn’t
trust each other. So, thankfully there’s
a solution here and it comes in different forms. One of the solutions
for PHP is suPHP. In the suPHP model,
customer A’s code is executed by a username called A,
the same user’s username. B’s code is executed by
username B. In other words, the web server sort of magically
transforms itself into user A when it’s time to execute A’s
code and transforms itself into user B when it’s time to
execute B’s code which means if you screw up and have buggy
code and you’re customer A, whose files could you possibly
delete under this model? Only your own. And, you know, that still
might be unfortunate but at least you’re not
compromising anyone else on the system. And it’s your own fault if
you delete your own files, but it’s a good thing that
you can’t delete anyone else’s files. This also solves another issue. If your website is like
a commercial website even if it’s small with only hundreds
or thousands of customers, and those customers need to
upload files like photos, or videos or stuff that’s
not meant to be public in the Facebook sense but
fairly private at least in the limited privacy sense. So, the upside here is when
a user uploads a file now and the web server is using
suPHP, that file will be saved on the disk as owned by user
A and B’s files will be saved as user B. By contrast
in the other model where everything gets run by
Apache, who saves the files? Apache, which means
Apache owns the files and that means the
only way to ensure that they can be
accessed subsequently is to make them world
readable which means all of the new content your
users are uploading is going to be readable by
customer A, and B, and C, and D on the system. So, in short, this is good and this is not a feature
that’s typically advertised significantly by web post. I don’t even know if
DreamHost does it these days. I’m going to guess they don’t
because we didn’t see mention of it, but don’t
hold me to that. You might want to dig a little
deeper into the fine print. But if you are using something
like a virtual private server, you can also avoid this issue
altogether because at least if you own the whole server, even if it’s a rented
virtual machine, at least there’s no other
customers on the same server. So, again, something to be
mindful of so that, you know, when you pay 895, 599,
whatever it is per month, again you’re getting
what you pay for. And if you care about
your intellectual property and the security of your
site, these are the kinds of questions you should be
mindful of asking or reading up on before signing up. So, suPHP is something that will
be installed in the appliance. So, for those who would like to
read up on the language itself, this week, there will just be
recommended readings of sorts. Realize that there are
some good tutorials online. And again, if you have
a programming background in any syntactically
similar language, some of these might even be
boring which would be great because it will walk
you through for loops and while loops and the like. So, we’ll just do a
quick tour of some of these syntactic details
tonight, but then focus on some of the higher level concepts that will be distinct
to web programming. So, without further ado, one
of the more stupid details but I just put it out there because it’s the
first thing you see. Variables, again, start
with dollar signs in PHP and here is the rule
as to what is valid. In short, I would choose
variables in a sort of normal way typically
with alphabetical letters but there are some other things
you can use like underscores and numbers and the like. But again, we won’t
spend too much time on this kind of level of detail. Data types, PHP is what’s
called a loosely typed language which means the data
types exist, kind of, but they’re not readily enforced
in the same way that they are in java or in C or in C++. So, what data types
exist, booleans, integers, floats, and strings. But when you declare a variable,
you do not specify its type. It is inferred by the type of value you actually
put inside of it. So, if you say something
like $x, because again dollar sign
means this is a variable, $x is a very boring
name for a variable but it’s a variable equal sign,
one, two, three, semicolon. That data– The data type of that value will be integer
even though I didn’t specify it as such. If by contrast I say $x
equals 1.23 semicolon, it’s instead going to be what? Yeah?>>Float.>>Float, a floating point
value, a real number. If you instead say equals true or equals false it’s
going to be a boole. If you instead say “hello”,
it’s going to be a string. But that type is not invariant. If you try to use a string
in a boolean context, then you go get a lot
of implicit conversion. So, in other words
in an if condition, normally you would say
something like if x equals y or you would say if true,
something like that. If instead you say
if “hello”, well, hello will be implicitly
casted to a boolean and because hello is not the
number zero, the boolean value of hello is going to be true. So, you can use strings
even as truth values which can encourage
sloppy programming and we’ll see some
examples of these, but it’s also useful
sometimes and that it’s not as pedantic a language
as something like Java where you are constantly
casting things back and forth. Yeah?>>Can you perform
string operations on integers or vice versa?>>Good question. Can you perform string
operations on integers and vice versa? Yes, they will be up casted
to a string in that case and become part of
the string itself. And one of the motivations
for this is that PHP from the start was really
designed to be web-centric, and the reality is when
you’re writing web software, you’re interacting with the
user entirely via strings. Now, the user might
type in one, two, three, but as we’ve seen via HTTP GETs
and talked about HTTP POST, it’s all text at
the end of the day. There’s no data type associated
with an HTML input field. So, even though the user
might type one, two, three, what’s going to be sent to the
server is “one, two, three”. And so the fact that there is
this loose typing is reasonably consistent with what you’re
getting from the user anyway, even though again it
can feel a little messy, and it is in some sense,
but that’s at least one of the original motivations
for it. In terms of objects
and collections, Java– PHP has arrays and it also has
objects, more on those to come. And there’s also
things called resources. Resource is something
like when you open a file, what you get back is not the
file per se, you get sort of like a pointer or a
reference in C or Java-speak. And that reference is to
a resource which is sort of like a special object that contains interesting
information, the size of the file, your
location and then the type of it, and so forth,
details like that. Null is null. It’s when you have
no value there. You can have the value null as
a placeholder, but variables in PHP as well see can
also be set or not set. So, null is an actual value. It doesn’t mean the
absence of a value. You can have the absence of
a value as we’ll soon see. And then there’s mixed. So, mixed isn’t really a type
but you’ll see these things in documentation, in particular. If you see on PHP.net
documentation that says this function takes
mixed, what does that mean? Well, it means it can take
any number of different types. It can accept a string
or a number, and this is where PHP is both
handy but also a little sloppy and that it’s not strictly
typed or strongly typed. Number means integer or float
if the function doesn’t care. And a callback is
a function pointer. We won’t spend too much time on
those but you can pass function around by pointers or by
references generally known as a callback, in this case. Another word on mixed, PHP
is very common for its design of returning mixed data types. So it’s very common in
PHP for a function even like date to return strings. But if something goes wrong, it
could actually return a boole. And what’s it going to
return in that case? False. So, it’s very often,
it’s very common rather in PHP functions that
you’ll– 99% of the time, it will return a
certain data type but it could return
something very different. So, learning to check for
that correctly is good in the context of PHP. So, I’ll point that
along the way as well. So now, some special variables
before we start writing some code. So, in PHP, there are
special global variables that are called superglobals. They are in scope, so
to speak, everywhere. In any line of code you write
so long as it’s executed by a web server, you have
access to these variables. They start with dollar sign, start with underscores
and then all caps. So, $_GET is a variable. It’s going to be an array. It’s going to be an associative
of array AKA hash table, AKA– not really an object, but
it’s a key value store. What do you think is in
that variable called $_GET? Take a guess. Yeah.>>All the things that are in
the URL that performs them.>>Exactly. So, Q equals Harvard,
foo equals bar, bass equals coax [phonetic], whatever the user submitted via
the form is going to be handed to on a platter, so to speak,
in the form of this variable so that if you want the
value of Q, you just have to look inside that variable. And this is one of the things
that’s compelling about PHP. In contrast, language like Perl,
which is very popular years ago for web programming, you either
jump through hoops or use an– a popular library to actually
parse the HTTP requests to get access to
the keys and values. PHP and frameworks like Django
and Ruby on Rails make this so much easier these days. And PHP does this
to the superglobals. $_POST, well I guess what that does anything you
post ends up in that array. $_FILES is great, too. If you do let the user
upload photos or whatnot, you’re handed the files in the
form of an array, you don’t have to parse it or figure out how to
deal with file uploads yourself, super easy in that sense. Some of the more esoteric
ones now are SERVER and ENV. SERVER contains things
like the user’s IP address, they’re user agents. What was user agent? Yeah?>>The browser and
the operating system.>>The browser and the operating
system, that cryptic string that is apparently being
sent every time the browser visits you. And this ENV variable is rarely
used but it gives you access to lower level details
on the machine. COOKIE is nice and we’ll
come back to that next week. But COOKIE stores cookies,
key values that you might send or receive from browsers. $_REQUEST has all of
the interesting details about the user’s request. What path did they request? What was– Was there a question
mark in the URL with parameters? So, if you access to the
raw details before they end up in a more user-friendly place
like GET and POST and COOKIE. And SESSION is one of the
most powerful ones, arguably. It is the thing that allows
you to implement a state and implement things
like shopping carts. Even though HTTP, as we sort
of began to discuss on Monday, is stateless and that as
soon as you visit a page, and you disconnect from the
server and the page is loaded, you no longer have a connection
to the server anymore. Via COOKIES, you can remember
or rather a server can remember that you’re logged in
and we likened the COOKIE on Monday to like a hand stamp. And what is SESSION? SESSION is this amazing
superglobe on PHP that you the programmer can
put anything you want in it, any keys and values, any
numbers, any strings, any ISBNs, of things a user put
in their shopping car. And the next time the
user visits your website, so long as their cookie
hasn’t expired, you can access that exact same data
in $_SESSION, magically, so to speak. You don’t have to worry about
figuring out who the user is. PHP and in turn the
web server do all that for you out of the box. So again, another upside
of a language like this. So let’s actually see this
in action rather than talking about it in the abstract. So, last time, recall that we
had this file, let me go cs75. net lectures where we
posted a video and more. And in our source
code directory, typically if we write some
source code on the fly, during lecture I’ll clean it up
and then upload it the next day to the server if you want to
play around so you don’t have to write down code and whatnot. Or if we have some stuff in
advance, I’ll put it there. So this is from Monday,
and we had this site, Google and Google Search. And when I submitted this,
recall that if I search for Harvard enter, I ended up
at– enter, oh, we broke it. I should fix– I will fix this. Recall that I broke it
at the very end of class, by changing the value of Q
to something else altogether because I think I said like QQQ
or something random like that. So, let’s now instead of using
Google to do our back end, let’s instead write
the back end ourselves. So I’m going to go
ahead and do this. First let me grab this page
source and I’m going to open up our little text
editor as before. And yeah, this is what
I did wrong last time. So now it’s back to Q. But this
time, I’m going to change this to point at my own server. So then a word on a server,
if I scroll over here, this is my CS50 Appliance, the
virtual machine that in a week and a half’s time, we’ll
start using as well, and it’s in Linux
computer, but more than that, even though it looks
like a desktop with a little Start-like menu
in Windows, it’s still a server, and I can see this as follows. If I go ahead and inside of
the appliance I visit, Google, I see Google, but if instead
I do http://local host, local host is the common name
for a Linux computer when you’re on the Linux computer itself. And this is true in Mac OS as
well and sort of in Windows. Local host refers to
the computer you’re on. So when I visit http://local
host and let’s just say slash, enter, I should see the root
directory of the web server. So this is what I’m seeing. The fact that I’m seeing
this page, and actually, it tells us literally what
it is, “This page is used to test the proper operation
of the Apache HTTP server after it’s been installed. If you can read this page, it
means the web server installed at this site working
properly but it’s not yet being configured.” So that’s great, that’s
exactly what I wanted to see. Some mentioned that web server
is working and now it’s up to me to actually populate
it with some data. Now I can do something else now. And these kinds of steps
if you’re unfamiliar, we will explain in the first–
before the first project. What I’ve done is, right
now, is I’ve opened up a so-called terminal window. This is an old school
black and white interface for navigating the
contents of a computer. It’s like the DOS
prompt of yesteryear. Mac OS, it’s the
terminal window. Windows sort of has an analog in
the command prompt, but it’s not as flexible as on
Linux and Mac OS. And I can do a few things here. Again, we’ll document
this more in the future, but this is fairly archean
command for making a directory, mkdir, space, the
name of the directory. Now I’m going to go
ahead and hit enter. And what that will do for
me, ignore the control C, is I can now do cd public html and that stands for
change directory. And in change directory,
now I am inside of this, so cd is like double
clicking a folder in a modern operating system
which then opens a new window. So cd has now put me
inside of public html. So now I’m going to
go ahead and do this. I’m going to go ahead and run
a command like gedit hello or let’s do google.html. Gedit happens to be a
text editor for Linux, so it’s like a text edit,
it’s like notepad.exe, but this one’s a little nicer and that it supports something
called syntax highlighting, whereby my code will
be colorized to be more user-friendly. So let me go ahead and copy
what we wrote on Monday over here and paste it in. So this is what I mean
by syntax highlighted. It’s just pink and
purple and whatnot, just to draw our attention to
semantically the different parts of the web page, and now I’m
going to go ahead and hit save. So control S, or I
can go to file menu. And now, let me go back
to that terminal window, and again I’m back in Linux
here, and I’m going to go ahead and do ls, and notice I have
a file, called google.html. And I can do all
sorts of commands. There’s the cat command which
shows you the contents of files. There’s the more command which
shows you the contents of files. You can do any number of things. I can accidentally delete it with the RM command,
don’t do that. But I can do all sorts of things
at the so-called command line that I could with a mouse and
a keyboard, traditionally. So, what’s the takeaway here? Now that I have google.html,
notice that I have it in my public html directory,
but if you can infer, who am I at the moment? What’s my user name? Yeah?>>Jharvard?>>Jharvard, so I
am John Harvard. Why is that? Well we configured this
particular virtual machine with a generic username,
John Harvard, so that anyone can use it
and so that in documentation and whatnot, we can tell you
exactly what your username is. It just gets a little
more annoying if everyone has unique addresses because troubleshooting
is harder and so forth. So just assumed you’ve signed
up for a web hosting company. They have arbitrarily told you
your username will be jharvard instead of A or B. So now I’m in John Harvard so-called
home directory, the folder that I get
for all my storage. And in there I created the
public html subdirectory or folder. And in there, just to be clear,
what’s inside of public html at this point in the story? Google.html. So how do I visit google.html? Well, I’m going to open my
Chrome browser and rather than visit just local host,
I’m going to actually do this, http://localhost/ tilde
jharvard/google.html. So this is a convention
on a lot of web servers. When you want to access
a specific person’s home directory, you do slash tilde
username, slash filename. You do not type what apparently? Public html. So public html is implied by the
fact that you’re using the URL, so don’t type public
html in URL itself. And now I’m going to
go ahead and hit enter. And voila– damn it, broken. So what does this mean? First of all, which–
what’s the status code here? Has anyone spot it? Yeah?>>Forbidden.>>Forbidden, 403, you can see
it in the tab at the very top. So that’s one of those
more archean status codes. 404 is a little more
common, File Not Found. File is there, but
I’m forbidden. So just high level, in
English, what does this mean? Yeah?>>Didn’t set the permissions.>>I haven’t set
the permissions. So we talked early about the
idea of global permissions. Now let’s frame this
in a Linux context. And again, Mac OS
is very similar. Windows isn’t quite the same
process, but the ideas exist on all of these platforms. So, let me do ls for list again. This is like dir if you
come from a Windows world. And I see google.html,
not all that enlightening. But I can do a long listing. So ls -l and then hit enter. So -l, for those less
familiar with Linux or unfamiliar is the switch,
the command line switch or flag or option, whatever
you want to call it, that modifies the
behavior of the command which in this case
is called ls, enter. And now I see more outputs. What do I see now? I see first who owns the
file, what is their group and by default the
appliance is configured so that there’s a students group
and there’s only one student for everyone called jharvard. But when you install
your appliance, you’re not sharing
the same appliance. You have your own
copy of the appliance with on jharvard account. This means it is 424 bytes which means 424 characters
I typed in to that file. This is when we last edited it. This is the name of the file. And I skip the most interesting
part which is over here. Now, this is maybe
a little cryptic but rw generally
denotes read and write. And what we have
here is an indication of three types of permissions. So this is a very crash course. Again, you don’t need to
commit all of this to memory yet because they’ll come up
again in the actual projects. But what we’ve just
done here is– let me actually copy
and paste this. We have this sequence here. What in the world
does this mean? Well first, I’m going to
cheat and I’m going to get rid of this one, the first dash is
either a D if it’s a directory, or a hyphen if it’s a
file, something else if it’s something else. But for now, let’s just
assume that directories and files are all that exist. So now there’s this and let
me put some spaces then. It looks like we have a
pattern of triples here. The first triple is
the owner, so to speak. The second sequence is the
group, in this case, students. And then the last is the world. So what is the implication
right now? The owner can read
and write this file. The group, students,
can read and write. That feels a little
worrisome, but in this case, the virtual machine
is on my own computer. There’s a students group
but I’m the only student. So this is kind of immaterial. So it’s not great but not
bad, it doesn’t really– it’s not applicable
at the moment. The whole world though
can read this and that’s what I
want for an html file. So it feels like my
permissions are right. What else could be wrong then? Again context is web
server is running as Apache or some username
that’s not me right now, but we have to give
him access to it. Yeah?>>I have a question.>>OK. [ Inaudible Remark ] The last dash? In this case, no. This is actually OK and
others would be possible. Technically rw or
r would be fine. Or even rw nothing
r would be fine. Point is that the world
has to be able to read it. But what else does the
world have to be able to have access to, do you think?>>Directory.>>The directory, right? We got to go one level higher. So how can I do this? Well, when I did ls -l a moment
ago, I only saw the file. Let me do ls -al which is
all in long or I can do this, you can combine switches
typically in Linux just for our convenience
like this, al. Now I see more. The first two lines
are dot and dot-dot. What does dot represent
in a typical file system? Sure. [ Inaudible Remark ] Close. What does dot represent? Oh, let me change the question. What does dot-dot represent? Excellent. It means the directory
above, so dot-dot. So dot, though, by
contrast represents?>>Current folder.>>The current folder,
the one that you’re in. So dot is where you are,
dot-dot is your so-called parent which just means the thing
you’re inside of that the– that what the parent folder is
that your folder is inside of. So dot here refers to a
directory called public html. Dot-dot refers to
my home directory. And now– I know what it is. Damn it. OK. So, I’m going to have to
fake the story slightly for just a moment. Everything is actually correct. There is another secret
setting that I changed earlier in the week while playing with the virtual machine
that explains this. It’s a feature called SELinux
for security enhanced Linux which disallows anyone including
John Harvard from using the web. So let me see if I
can quickly fix this, but this was a wonderful stroll
down the diagnostic techniques that would have led
us to the solution. [ Pause ] Uh-huh, oops, and we go here. OK. So, OK. So, this is a detail you
will not trip over yourself because by default what I just
did is already done for you. It’s just I disabled that we’re
playing around the other day. This was an additional security
mechanism called SELinux which comes with flavors of
Linux like Fedora and CentOS, and Redhat and it’s meant to
lock down systems even more. But doesn’t matter because the
story we told is still very much the same. In fact, I can simulate now how
we could have created a problem for ourselves as follows. Let me go into this directory and everything now
looks correct. All of this is good because
it means a few things. Google.html is readable
by the world. What do you think x means
for both dot and dot-dot?>>Executable.>>Executable. Now normally, executable
means like execute a file, run a program, but that’s
not the case for directories because notice the D and the D? For directories, if a
directory is executable, that means someone
can get into it. They can’t necessarily read it
and see the contents or is read. Execute means they can do
the equivalent of cd into it or they can visit the URL that
contains that directory’s name. So the fact that this is x, this x and this is r
is actually perfect. That’s what we want. But I can simulate
it being wrong. Suppose that by default
when I’d created this file, it looked like this. What’s wrong now
with this picture? What jumps out at you? Yeah.>>That it’s only read,
written by the owner and no one else can access it.>>Perfect. Only read, writable by the
owner, no one else can read it. That’s a problem. So there’s a bunch
of ways to fix this but the way we’ll
introduce for now is chmod which is change mode and then
a for all aka everybody, plus, what do I want to give everyone,
r. So a little archean, the syntax, but then this
command gives it what do we want. Change the mode of
the google.html to get everyone r.
The plus means give, minus means subtract. So enter, ls -al and now
that problem is solved. By contrast, if the
directories looked like this, propose to me how we
fix this problem now. Now my dot and dot-dot
directories are no longer executable which means my file
is readable but no one can get into this directory via the web. How do I fix this?>>A plus x.>>OK. Good, a+x for
executability and then the name of the file which is–
or folder which is dot and I can actually put
a space separated list of these things on
the command line. I can hit that and now ls
-al, we fix that problem, too. Now suppose I goofed and suppose
I do chmod a+x google.html you can maybe guess what’s
going to change. So think to yourself what does
this line going to look like. In just a second, now it
has an x everywhere as well. Does this mean anything? In this case, no, it’s an
HTML file, it’s a static file. Making it executable
means nothing. And so, is this going
to break anything? No, it’s just kind of
wrong in principle. However, sometimes with PHP, your PHP files need
to be executable. That is not the case
on most web servers. Typically, they just
need to be readable. And we’ll now see
some PHP, all right. So that was a lot of
fun making google.html. Now, let us pretend to
implement a Goggle server. I’m going to go ahead
and hit New, let me copy this temporarily. So new file, I’m going
save this as server.php. So our very first PHP file,
we’re going to pretend to be Goggle for
a moment, enter. And now I’m going to start, you
know, I’m going to cheat here and say, you know
what, I don’t what to do any of these just yet. I’m going to just do something
silly like coming soon. So this, I argue, is PHP. I name the file server.php,
I claim you now no PHP. And why is that? Well in the world of PHP you
can actually commingle HTML and CSS with row of PHP code. So the fact that I haven’t
actually written any PHP code, is actually kind of sad
because this is not PHP, but this will still work. So let’s actually take
a look at what happens. I’m going to go into google.html
now, which again we made Monday. And I’ve already fixed
the query string. But I don’t want to go to
search on goole.com now, I’m instead get to
change this to server.php. In order words, when I submit
this form now, I want it going to my own file just
to see what happens. So let’s go ahead
and pull this up. And let me go ahead and type
in Harvard again, enter. Wait a minute, something
is wrong. What I’d do that’s wrong? I did not implement
this certainly. Yeah. [ Inaudible Remark ] Perfect, right? Stupid mistake, right? Caching, right? The browser has to be reloaded to actually get the
new copy of the HTML. So let’s hit the back button,
and let’s then reload here. And now, let me do
a sanity check. I’m going to right click and view page source,
now it’s correct. This is what the browser
is now seeing, server.php. So here we go, I’m
going to search for Harvard now and hit enter. Hmm, problem. So this is a security feature
that’s actually provided by suPHP. Just for good measure, suPHP
does not want your PHP files to be writtable [phonetic], why? Because if you screw up,
if the file is writtable, you could change the
file itself somehow. So we can fix this using what we
know already of chmod, ls-al– oops– ls-al, the problem is that the PHP file is
writtable by group. How do I take away that W
from my group do you think? Yeah.>>Use G minus [inaudible].>>Perfect G minus W
for a server.php, enter. And now I do ls-al
and that’s OK. And you know what I’m going
to do one more thing chmod, I’m going to do a
minus r of server.php. And now, here is the output. This is actually wrong now. I need to give myself back. So a chmod owner, O plus
R of server.php ls-al– oops– let’s cheat here. So now what do we see? OK. So now, I argue that
this is sufficient for PHP. Whereas JavaScript and HTML
and CSS and GIFs and PNGs and JPEGs need to
be readable by all, I argue now that PHP files
only have to be readable by me. Why does this distinction? Why does this make
sense in the context of what we’ve discussed
this far today? Yeah.>>This is just wild guess. That made the PHPs just
run on the server not by the actual user
on the other side–>>Perfect.>>– it’s just getting
what the PHP needs to, in which is irrelevant
in this case.>>Exactly. So whereas static files
like JavaScript, CSS, HTML, JPEG are ultimately sent
literally to the user to be viewed and
seen by him or her. PHP is meant to be first
interpreted by the server and then the server
will send the output of that PHP file to the browser. Now at the moment, we have
kind of a silly example. Inside of server.php is
no PHP code whatsoever. What’s inside of
there, just HTML. So what’s going to happen when
I reload the page, and resubmit that form, the web server
Apache is going to realize, “Oh, you have submitted a
form to a PHP file.” Why? Because it ends in .php. I am configured because of
the way the LAMP stack works to interpret .php files
using the PHP interpreter which is just a program
that understand PHP. Now, the PHP interpreter is
going to look for PHP code. Anything that’s not a PHP code, it’s defined to just
spit out raw. So anything in the file,
even if it ends in .php, if its not a PHP code itself, it just get sent
raw to the browser. So what is the user going
to see in this case? Literally all of my HTML because
I haven’t written a single line of PHP code yet. But the point though is that
because it did end in PHP, the principle is the same, only
the web server has to be able to read that PHP file in
order to interpret it. But who is the web server going
to be running as for PHP files? Yeah.>>Jharvard>>Jharvard, because
of the suPHP feature. Substitute user PHP, means for any PHP files substitute the
user who owns the file so that, the security mechanism
we discussed is in place. So I’m going back to my
browser, I’m going to go back to the form, I’m going
to resubmit Harvard to my fake Google search. And now enter, now,
list the URL, is server.php question mark Q
equals Harvard, Coming Soon. How, lets write some PHP code. One of the most powerful
things you can do in a dynamic website is actually
spit out what the user has done. So here is my PHP code, rather–
well, it’s sort of meaningless because there is no PHP,
let me– your server.php. Instead of coming soon,
let me do something like, “You wanted to search
for; let me do a bold tag, and let me really cheat
now, harvard, save this. All right. Now, nobody should be fooled
by this, when I go back here, go back, do I have
to reload the form? No, because I only changed
the server.php files. You don’t need to
refresh everything. I didn’t change the Google.html. Let me go ahead and click
Google search, oh my God, we now have a dynamic website. I typed Harvard and Harvard
appeared on the screen, but not really, right? Because if I go back
again, and I type in Yale in I Google search, OK,
I’m clearly cheating. So let’s be a little
more genuinely dynamic. Let’s go here, and I don’t
want to spit out Harvard. But based on the discussion
of superglobals earlier, where in the world can we
find what the user typed in for queue? Yeah, go ahead, yup. [ Inaudible Remark ] In it GET superglobal, yeah. So let’s do this, we now need to insert the value
of that variable. And you might just want to do
this, $_GET, here is the syntax for going into a superglobal. You do square brackets,
quote and quote the name of the thing you want
to GET closed bracket. All right, so this is
a super global itself but it’s more specifically
in a associate of array otherwise known
as a hash table, hash map, whatever, you’re familiar with. And that means you index into
it using not numbers but words or letters, and once you get out
of it, is the key– the values. So in this case, we should
get back H-A-R-V-A-R-D or Y-A-L-E, but not quite. So, let me try this just
to prove that I’m wrong. Let me go back here,
real search. OK, clearly not what
I want but I need to tell the server,
here is PHP code. Otherwise, it’s just
cryptic looking English. The means by which I do that
is I have to enter PHP mode, open bracket question
mark PHP space. And then on the end
kind of the opposite, question mark close bracket. If you’ve come from the world of
ASP in Windows or JSP in Java, you might have seen similar
tags, this just means, enter PHP mode, do
something, exit PHP mode. So let’s see what the
end result is here. Let me go back to Google,
reverse, Google search for Yale, interesting. What is missing here now? What did I do wrong? Yeah?>>Well, you have to actually
set the value of the GET.>>Exactly. So think about any programming
language you know, generally, if you want to print
the value of variable, it’s not sufficient
just to write the name of the variable in your program.>>Echo.>>Echo would work but we
have a couple of options here. We can say echo,
literally, we can say print and then we can do a parenthesis
to make an actual function call. I’ll go with this one for now. But echo is also
a viable option, and now we’re explicitly telling
the interpreter, print the value of this variable here. So let’s go back to my
browser, go back, resubmit Yale. And now, we have
some dynamism to it. Yeah.>>Is there a difference
between echo and print?>>Is there a different
between echo and print? Not really. Print is a proper function,
Echo is a language construct that the crazy people
in the internet that have done benchmarks
comparing print and echo. And every blog post
that I’ve read, pretty much says
they’re equivalent. Now, except for microseconds
or milliseconds of your echoing millions of
things but, for all intents and purposes, they’re the same. So we can do something
else here. And now this is a religious
thing that I’m sure some people on the Internet will
hate me for saying. But, I’ve always thought
this is atrocious construct for saying enter PHP mode. And indeed, PHP also supports
what are called short tags, open bracket question
mark and that’s it. Now, there are corner
cases you can get into and if you read the crazy
religious debates online, you’ll see that, one of the reasonably
compelling reasons is that, if a web server is not
configured with support for short tags, this is
a short tag, because why? It’s shorter than I
what previously typed. Then you do run the risk of having your raw PHP code
transmitted to the user as though it’s just
HTML or the like at which point you’ve
disclosed the sanctity of your intellectually
property, or worse, your user names and passwords. So that’s kind of a legitimate. But if you are running your own
web server, and have control over the short tags feature
in a file called php.ini, which is config file, I think
we mentioned briefly on Monday, that we’ll be on the
appliance for you to tinker with if you want. Frankly, I just think
there’s an elegance about the symmetry of this. But typically when
you’re writing code, that won’t necessarily
run on your own server but could be posted
as open source code, or you’re writing it
for corporate project where you don’t have control
over the web servers themselves, the first way I did it with
open bracket PHP is the preferred way. Because it’s more portable,
it’s not going to break. Because the worst thing,
is if you download code that someone else has written
and it’s all short tags and your web server
doesn’t support short tags and you might not
control your web server because it’s a third party web
post, it’s a pain in the neck. You go through thousands
of lines of your own code
changing your short tags to long tags or vice versa. So just FYI, you’ll
see both tricks online. So this is nice but can
we do better than this? Well, let’s actually try
something a little more general. Let me go in here instead,
let me create a new form and let’s do a few different
data types this time. Let me go ahead here and paste
this in just to get it started. And then I’ll have
a registration form, and center Google Registration. Again, we’ll do register dot– or this time we’ll
do register.php. And let’s do a few
things this time. I’m going to do input,
name, equals name. And I’m going to say, let’s
do this quick and dirty for a registration form for like
a conference or student group or something like that, input name equals
name, type equals text. And now, let’s do
a line break here, and let’s just do another
something here, like, let’s do Gender and
let’s do this check– or write there radio and
for something like gender. And then I’ll say value equals
M for male and I’ll say M here. And then I’ll stay over here,
input type equals name– nope gender– nope, name equals
gender, type equals radio, value equals F, and
now I’ll put F here. And then should we do
one more, let’s do one, just a simple drop
down down here. Let’s do a select, name
equals states, close select. Let’s do this here, option value
equals let’s say Connecticut, close option, and Massachusetts. So our registration form for whatever reason will only
support people from Connecticut or Massachusetts just so we don’t get bored
typing them all out. OK, so I’ve made a very
quick and dirty form in– sadly a file called google.php. So, I’ll restore that later so you can have the
original code back. Let’s go ahead and save
this as something else. So, register.html. OK. So, now let me pull
this up in my browser. Server is going to
change to register.html. OK. So there we have pretty
atrocious looking website. And in fact I’ve omitted one
of the more important pieces. So, what do we need? It’ll be nice if we
had a submit button. So, let’s go in here,
input type equal submit, value equals register,
close brackets, reload. OK. So, there’s our
very simple website. It’s a little more interesting
than our fake Google site because at least now we have a
couple of user input mechanisms that we didn’t have before. So, then let’s now look on the back end what
we’re going to get. So, first, let me fill
this out as a sample. David, male, we’ll change
this to Massachusetts, and now I’m going
to click register. But let me zoom out so we
can see the URL change. Register, and now registered.php
was not found on the server but that make sense because
we haven’t created it yet. So, let’s go ahead and do that. Let me go back to
my text editor, let me copy this temporarily,
make a new file, paste this in, we’ll call this register.php. And I want to say
here registered and we’ll say something like,
“Hello”, open brackets, print, dollar sign, underscore,
GET, name, close bracket, close
bracket there. So, let’s take this
one step at the time. First, I’m just going
to say hello to whoever it was
that registered, OK? So, let’s get back
over to the browser. We’ll go back, we
submit the form and damn it, same bug again. So, quick how do we fix this, writeable by group,
that was the problem. Chmod.>>G.>>G.>>Plus>>Minus, w register.php. OK, fixed. Let me go back here. And notice, new status code
500, 500 is generally the worst, it means it really
did something wrong. All right. So, let’s go back here. Let’s reload the form and
wala [phonetic] Hello David. OK, so some progress there. so that’s good. And let me introduce one
other syntactic trick. Frankly, this isn’t the
prettiest thing printing out a symbol there. There is this trick you
can do with short tags which is very compelling. If you want to insert
the value of a variable, you can put open bracket
question mark equal sign with no space in between them. So, just to confirm, let me go
back to the page, let me reload and seems to be staying
the same, which is great. Now, let’s look at the URL. It’s more complex in Google’s
because we have multiple input. David, male, state equals MA. How do we get access
to these other values? Well, first let’s do a quick and
dirty thing and let’s just look at the entire contents of GET. So, let me go into
registered.php and I’m going to cheat now, I’m going to
output a pre-formatted of tag, we call that pre-formatted
text uses monospaced font just so everything looks like code. And what I’m going to do in here
is instead going to do ?=$_GET. But this isn’t quite right. Actually let me put
this on this line. So, you’d like do think
this will just print out the entirety of GET. But let’s see what
I see instead, if I go to here, let me reload. OK, not that enlightening. It just says array. But that make sense because
I did say GET is an array. So, we need to print
it recursively to see what’s inside the array. And the trick you can use, and this is not generally
for production code. You don’t say print, you
say print_r for recursive, and it’s a wonder way of just
taking a quick peak inside of variables. So, I’m going to
go to registered, reload, and there we go. So, this is what it looks like. This is completely
arbitrary formatting. This has nothing to do with
the underlying implementation, it’s just the pretty way of
printing the information. And now, I see three
keys, name, gender, state followed three value. So, this is just a
nice sanity check as to what’s actually in there. So, now I can do
something like this. Let me go back in the P–
registered.php, let me go back to saying h1 Hello equals
a $_GET name close bracket exclamation point close h1. Now, let me do this again and
I’m going to say something like You are a– this is going to be a little underwhelming
at first. Let’s just do gender and
then close that h1 tag. And then finally,
you are from state. So, this should hopefully
follow logically from what we did a moment ago. So, let’s reload now. And fonts are little big. Not the most user
friendly thing, but at least we’re on our way. However, notice that there
is no security mechanism in place here right now. There is no sanity
checking of user’s input. And notice, we used GET,
recall the URL looks like this. So, what if I instead
do something like this, this is not like a
correct website right now. So, there’s opportunities
here, right? There’s opportunities to one
make sure that what the user– what we provided to the user
is options are actually checked on the server side. Two, we can make it
more user friendly, it’ll be nice is Massachusetts
said Massachusetts not MA. It’ll be nice if the M
became male in lower case or you are a guy to you are
a girl or just something. So, there seems to
be opportunities here for if conditions and
else’s and some kind of conditional checks
and so forth. So, we can build up there. But one of the most important
takeaways is that right now, we’re just trusting what the
user has submitted to the form and this in of itself
is not a good assumption because we can do something
even worst than this. This is a very common
thing known as an– cross-site scripting attack. That we’ll talk about more
toward the end of the semester. But if you’re familiar with
JavaScript even minimally, what if I do something crazy
like this, you have been hacked, question mark here,
close script tag. OK, that’s my name
I claim, all right? So, what’s going
to happened now? Well, because of the service
side code what am I doing with the name parameter? Yeah.>>You’re closing
the script site?>>I’m closing the script. Well, and well, actually
you notice here, I closed it but I
also opened it. What am I doing in
registered.php with that value? Yeah.>>You’re actually
going to get– you’re going to send
that string to the user and the user [inaudible]
is going to interpret it as JavaScript.>>Exactly. I’m literally going to spit
out what the user typed. But if the user typed HTML,
that’s going to add it to the page and that HTML
is going to be executed or interpret it, and
if it’s a script tag, it means the JavaScript
code is going to run. So, in short, what we just did
is amazingly simple, too simple. Very bad, like this
is not a good code. And many websites
make this mistake because watch what happens now. If I go here and click register,
o-oh, what did I wrong? Register [inaudible] script,
all right, stand by one second. My dramatic alert. You have been hacked. Hmm, Chrome, are you
doing this to me? [ Inaudible Remark ] That should be OK
where we put it.>>Semicolon.>>Semicolon, let me go back. You have been hacked. That should be OK, let me try
one other thing otherwise this is going to be a
very underwhelming– type equals– oh my God. All right. Stand by for one second. We’re going to try
one other thing here. Otherwise, you will never
believe anything else I’ll say. OK, 151, 128. Register.html. OK, so before I tell
you what I just did, we’re going to try this again. Script, alert, you have
been hacked, Massachusetts. Oh, damn you Chrome. OK, Google has been too
helpful for its own good. So, Google is detecting what
we just did and is scrubbing that apparently for us, which
is rather good and bad of them. So, this was the effect
I was trying to create. So, I very quickly open
up the Firefox instead, which apparently doesn’t have
this protection in place, and this is not the
behavior we wanted. But as soon as I click OK,
we should at least see some of the behavior I expected
but not quite all of it. Now, this is stupid, right? You’re an idiot if you’re
trying to like trick yourself into executing JavaScript
alerts. Like this is not really
threatening anyone other than myself. However, if you think
about how we did, notice what’s in the URL there. So, apparently you can
trigger these kinds of tricks by typing an input
manually to forms but that’s the silly
way of doing it. What if instead you are
bad guy and you’re doing like a fishing attacks,
sending people bogus emails, and you’re telling
them to click a link and they don’t necessarily
see the whole link because it’s hidden with
HTML email formatting. But they click that link,
they get led to my page and then some JavaScript
code executes. Well, this too stupid
JavaScript. Triggering an alert
is not hacking anyone. But as we’ll see in a few
weeks with JavaScript, you also have access to a
user’s cookies in JavaScript, which means there are attacks
that we’ll talk about later in the semester whereby you can
steal someone session cookie, high jacking their session
in the same way we discussed on Monday with Firesheep
and Starbucks and the like by having tricked the user
into typing or clicking a link that it takes advantage
of this failure to escape the user’s input. So, the fix here is
actually relatively simple, if tedious in my code, you
never, ever, ever, ever, want to trust what
the user has typed in. So, the real way to echo user
input is something like this, HTML special chars, which is an
annoyingly long function name but it is a very good function. And that it will ensure that
any potentially dangerous characters, among them the open
bracket, which as you know, demarks that start of an
HTML tag will be escaped. So that now, if I go back and
resubmit the exact same form– now, I look like the idiot
because I’ve typed in– displaying exactly
what I typed in, which you would think is the
expected behavior anyway. So, one of the recurring themes
that we’ll discuss not just at the end of the semester
but throughout is how to take advantage of
things like escaping both for user input here,
for JavaScript inputs and most importantly
for database inputs so that ultimately you are not
vulnerable to attacks like this. So, what did I do
to work around this? In Firefox, notice my
URL is very different. In Firefox, what URL did I
use to visit the website? The same website. Yeah. So, go ahead, what is it? [ Inaudible Remark ] Yeah, so this private
IP, 192.168.151.128. Where did that come from? Well, the CS-50 appliance, the virtual machine
I’ve been running, it’s just the computer
on the internet. I’ll be the virtual one,
and because I’m running it in the program called VMware,
which is again a hypervisor that allows you to run one
operating system on another. Notice in the bottom right
hand corner of the appliance, there is mention
of my IP address. And this can change
all the time. VMware in this case is acting as the so-called DHCP server
giving the appliance a different IP potentially every
time I turn it on. But this is just a
configuration we put here to always remind the human
what IP address here she has. So, what is the implication? This is nice because it
means I can as I promise on Monday minimize the
appliance all together. Not even have to worry about
getting too comfortable with the actual Linux
environment, and I can just treat
this as a remote server. Now, it’s remote in
the sense that I– it’s remote as though
it’s remote. It’s actually physically present but I can still address
via– an IP address here. And if I’m on my own Mac or
my PC depending on your OS, I can now just visit that
actual URL with the browser, it says though I’m
visiting a remote server. And if I’m really particular,
and I just don’t like looking at this address, what
I can do as what I did on Monday whereby I can
open up the terminal window and I can do edit etc
hosts, type in my password. And then remember we did this
trick here so let me go here and then I can do
davidsecretwebsite.com. And now, because I’ve taught my
Mac to make the DNS association for me, I can change this
to this, and now notice, davidsecretwebsite.com is born. I’ll be at only on my
own local computer. So, when I mentioned earlier
that you can do developments on your own computer,
it’s a wonderful way of doing website development because you can still
simulate all of the realities
of HTTP and DNS. But locally without needing
an internet connection, without needing remote server
without having to pay anyone for those services, you can
spend those months upfront working at home and a café at
work all without needing any of the physical infrastructure
that’s typically associated with the internet. So, we’re also introduce you in the first project
to this approach. But we’ve only just
scratch the surface. So one, all I’ve been doing
is that going out input, but clearly a website like
Facebook and Google take input, it checks the inputs with
if conditions and else’s and the loops in and what not, it does like writing
things to data bases. And it would also be nice to
to move away from what seems to be a very sloppy start. Whereby, we’ve been running
HTML and then I kind of dropped in to PHP mode very quickly,
then went back to HTML. This is not going
to scale very well. So, if you’re coming to the
course with the background in ASP or JSP or even Django
or Rails, there are ways of cleaning up our codes so
that we can practice some good principles like, let’s
keep presentation separate from our data. This is one of these mantras
that makes good sense especially for large projects where
you keep your HTML separate from your CSS, separate
from your JavaScript, separate from your data, separate now from
your PHP codes. So, even though tonight
we’ve started to dive in with this commingling
approach, and on Monday, we’ll do some more of
the same, we’ll also look at some common paradigms
among them, MVC, Model-View-Controller,
where you can really start to separate these
things into more complex, more sophisticated, rather more
clean redesigned applications. But for now, why
don’t we go ahead and adjourn here officially. We’ll take a 5, 10-minute break. Peter we’ll get set up, if you’d like to remain per
section by all means do. Otherwise section will
be filmed as usual and be placed online
by sometime tomorrow. And I’ll linger around
for one on one questions. All right, we’ll
see you on Monday. [ Silence ] END

Leave a Reply

Your email address will not be published. Required fields are marked *